The document discusses ISOcat, a registry of linguistic data categories that aims to provide standardized terminology for language resources. It describes how data categories in ISOcat can be uniquely identified with persistent IDs and referenced in XML and RDF resources to make the semantics of those resources explicit. It also outlines how relationships between data categories are represented and how the ISOcat registry can be used to find, create, and submit new data categories.
On the way to a Relation Registry for ISOcat data categoriesMenzo Windhouwer
This document discusses the RELISH project which aims to harmonize terminology between the GOLD ontology and the ISO Data Category Registry (ISOcat). It describes how the RELcat relation registry can be used to store relationships between data categories from different sources. RELcat implements a relation registry based on RDF and SPARQL and currently contains relationships between categories from ISOcat, Dublin Core, RELISH and GOLD. The document outlines some of the challenges in mapping between different category systems and examples of the relationship types that can be represented in RELcat.
This document provides an overview of a presentation on representing and connecting language data and metadata using linked data. It discusses the technological background of linked data and the collaborative research opportunities it provides for linguistics. It also outlines prospects for using linked data in linguistics by connecting annotated corpora, lexical-semantic resources, and linguistic databases to build a linguistic linked open data cloud.
This document discusses the Web Ontology Language (OWL). It begins by providing motivation for OWL, noting limitations of RDF and RDF Schema in areas like expressiveness. It then outlines the technical solution of OWL, including its design goals of being shareable, changing over time, ensuring interoperability, and balancing expressiveness with complexity. Finally, it introduces the three dialects of OWL - OWL Lite, OWL DL, and OWL Full - and their different levels of expressiveness and reasoning capabilities.
This presentation discusses the value of inferred knowledge over LOD and presents a new version of FactForge, a reason-able view, the biggest body of heterogeneous generic knowledge on which inference is performed, showing examples of inferred statements across LOD datasets.
Representing Translations on the Semantic WebOscar Corcho
This document proposes representing translations of natural language descriptions associated with ontology elements and linked data on the Semantic Web. It discusses current mechanisms like RDFS, SKOS and SKOS-XL that allow some representation of multilingual labels but have limitations. It then introduces the lemon model, an RDF-based ontology-lexicon model, as a basis for the proposal. A typology of translation relations is presented, distinguishing between literal and cultural equivalence translations. The proposal is a lemon module that represents translation relations as well as metadata like translation type and confidence. Examples demonstrate representing literal and cultural equivalence translations between lexicons. The approach aims to directly and explicitly represent translations while maintaining moderate complexity.
The document discusses the Web Ontology Language (OWL). It provides an overview of OWL, describing its three sublanguages - OWL Lite, OWL DL, and OWL Full - and their increasing expressiveness and reasoning complexity. The document also reviews the requirements for ontology languages and how OWL builds upon XML, RDF, and RDF Schema as the ontology language for the Semantic Web.
The document discusses the need for semantic technologies like ontologies to help address information overload by allowing machines to extract knowledge. It describes the evolution of semantic technologies, starting with XML providing syntactic interoperability, RDF providing a semantic grammar through assertions and relationships, and RDFS providing semantic interoperability through hierarchies and taxonomies for defining vocabulary. However, RDFS is not expressive enough to model all ontologies, so OWL was created by W3C to further extend RDFS while addressing complexity through different profiles like OWL Lite, DL, and Full.
This document discusses semantic technologies and digital data processing. It provides an overview of semantics and the semantic web, including XML, RDF, OWL, SPARQL, ontologies, and data models. It also discusses capturing semantics in XML documents, OWL, RDF schema, semantic web applications like cartographic searching, SKOS for knowledge organization systems, and the SKOS Play visualization tool.
On the way to a Relation Registry for ISOcat data categoriesMenzo Windhouwer
This document discusses the RELISH project which aims to harmonize terminology between the GOLD ontology and the ISO Data Category Registry (ISOcat). It describes how the RELcat relation registry can be used to store relationships between data categories from different sources. RELcat implements a relation registry based on RDF and SPARQL and currently contains relationships between categories from ISOcat, Dublin Core, RELISH and GOLD. The document outlines some of the challenges in mapping between different category systems and examples of the relationship types that can be represented in RELcat.
This document provides an overview of a presentation on representing and connecting language data and metadata using linked data. It discusses the technological background of linked data and the collaborative research opportunities it provides for linguistics. It also outlines prospects for using linked data in linguistics by connecting annotated corpora, lexical-semantic resources, and linguistic databases to build a linguistic linked open data cloud.
This document discusses the Web Ontology Language (OWL). It begins by providing motivation for OWL, noting limitations of RDF and RDF Schema in areas like expressiveness. It then outlines the technical solution of OWL, including its design goals of being shareable, changing over time, ensuring interoperability, and balancing expressiveness with complexity. Finally, it introduces the three dialects of OWL - OWL Lite, OWL DL, and OWL Full - and their different levels of expressiveness and reasoning capabilities.
This presentation discusses the value of inferred knowledge over LOD and presents a new version of FactForge, a reason-able view, the biggest body of heterogeneous generic knowledge on which inference is performed, showing examples of inferred statements across LOD datasets.
Representing Translations on the Semantic WebOscar Corcho
This document proposes representing translations of natural language descriptions associated with ontology elements and linked data on the Semantic Web. It discusses current mechanisms like RDFS, SKOS and SKOS-XL that allow some representation of multilingual labels but have limitations. It then introduces the lemon model, an RDF-based ontology-lexicon model, as a basis for the proposal. A typology of translation relations is presented, distinguishing between literal and cultural equivalence translations. The proposal is a lemon module that represents translation relations as well as metadata like translation type and confidence. Examples demonstrate representing literal and cultural equivalence translations between lexicons. The approach aims to directly and explicitly represent translations while maintaining moderate complexity.
The document discusses the Web Ontology Language (OWL). It provides an overview of OWL, describing its three sublanguages - OWL Lite, OWL DL, and OWL Full - and their increasing expressiveness and reasoning complexity. The document also reviews the requirements for ontology languages and how OWL builds upon XML, RDF, and RDF Schema as the ontology language for the Semantic Web.
The document discusses the need for semantic technologies like ontologies to help address information overload by allowing machines to extract knowledge. It describes the evolution of semantic technologies, starting with XML providing syntactic interoperability, RDF providing a semantic grammar through assertions and relationships, and RDFS providing semantic interoperability through hierarchies and taxonomies for defining vocabulary. However, RDFS is not expressive enough to model all ontologies, so OWL was created by W3C to further extend RDFS while addressing complexity through different profiles like OWL Lite, DL, and Full.
This document discusses semantic technologies and digital data processing. It provides an overview of semantics and the semantic web, including XML, RDF, OWL, SPARQL, ontologies, and data models. It also discusses capturing semantics in XML documents, OWL, RDF schema, semantic web applications like cartographic searching, SKOS for knowledge organization systems, and the SKOS Play visualization tool.
This presentation is about:
- Introduction to OWL
- OWL Basics
- Class Expression Axioms
- Property Axioms
- Assertions
- Class Expressions -Propositional Connectives and Enumeration of Individuals
- Class Expressions -Property Restrictions
- Class Expressions -Cardinality Restrictions
The document provides an introduction to Dublin Core metadata, including:
1) Dublin Core is a set of metadata standards including 15 simple elements and over 50 qualified elements for describing resources.
2) Dublin Core metadata can be used to improve resource discovery and is recommended for metadata harvesting and the semantic web.
3) Custom mappings can be made from other metadata standards like LOM to the Dublin Core Abstract Model to make metadata interoperable.
1) Ontologies play a key role in semantic digital libraries by supporting bibliographic descriptions, extensible resource structures, and community-aware features.
2) Semantic digital libraries integrate information from various metadata sources and provide interoperability between systems using semantics.
3) Key ontologies for digital libraries include bibliographic ontologies, structure description ontologies, and community-aware ontologies that model folksonomies and social semantic collaborative filtering.
This document discusses using semantic web technologies to enhance digital libraries. It describes how ontologies like MarcOnt can lift legacy metadata into a semantic format to improve search and interoperability. The JeromeDL project is presented as a case study that uses MarcOnt and other ontologies to power semantic search and sharing features for bibliographic descriptions. Semantic technologies allow digital libraries to better integrate information and provide more robust, user-friendly search interfaces.
Part 4 of tutorials at DC2008, Berlin. (International Conference on Dublin Core and Metadata Applications). See also part 1-3 by Jane Greenberg, Pete Johnston, and Mikael Nilsson on DC history, concepts, and other schemas. This part focuses on practical issues.
JeromeDL is a digital library built on semantic web technologies that aims to integrate and interconnect information from different sources. It allows users to semantically search and browse content, and also contribute annotations and social metadata. JeromeDL supports various bibliographic metadata formats and delivers semantic services like semantic search, collaborative filtering, and communication between digital library instances. Evaluations found JeromeDL can complete some tasks up to 50% faster than other services by automating processes.
The document provides an overview of the work done at DERI Galway, including developing technologies like SIOC, ActiveRDF, and BrowseRDF to interconnect online communities and enable semantic applications. It also describes JeromeDL, a digital library system that uses semantic metadata and services to allow users to collaboratively browse and share knowledge.
This document discusses improving digital library usability through semantic and social technologies. It presents the problem statement that digital library users lack guidance from librarians and connections with other users and information sources. The hypothesis is that semantic and social technologies can improve information discovery over classic approaches. It outlines an architecture and ontologies for a semantic digital library called SemDL. It also describes a prototype semantic digital library system called JeromeDL and its features like advanced search, social networking, collaborative filtering and browsing. It evaluates JeromeDL against a classic digital library and finds JeromeDL has better usability and user experience.
The Dublin Core 1:1 Principle in the Age of Linked DataRichard Urban
Presentation given at the International Conference on Dublin Core and Metadata Applications, Austin, TX. October 9, 2014. See associated paper http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/263
JeromeDL is a social semantic digital library that allows users to:
1) Contribute metadata and annotations that are interconnected within the library and on the semantic web.
2) Perform personalized, collaborative search and browsing based on semantics.
3) Access and share knowledge through integrated social networking features and extensible access controls.
- The document discusses representing WordNet as linked data using RDF. It reviews previous approaches using RDF to represent WordNet and its relations.
- A key approach discussed is the Lemon model, an RDF model for representing lexical information and linking it to ontologies. The Lemon model represents the connections between a lexical entry, sense, form, and reference in an ontology.
- Representing WordNet in RDF and linking it to other semantic web resources allows WordNet to be accessible on the web and integrated with other applications in a standard way.
The document discusses NIF (NLP Interchange Format), which aims to integrate natural language processing (NLP) tools via a common output format. NIF addresses issues with current NLP pipelines by allowing tools to be combined ad-hoc. It represents NLP annotations as URIs in RDF to allow merging of output. Ontologies provide interfaces to integrate different layers of annotation. The goals are to make NLP component interchangeable and reuse past annotations.
The document discusses Oracle Semantic Technologies for storing and querying RDF data. It provides an overview of how RDF data is stored and organized in Oracle databases using ID triples and URI mapping tables. It describes how the SEM_MATCH SQL function allows querying RDF data using a SPARQL-like syntax. Optimization techniques for SEM_MATCH queries include indexes and materialized views. The core entities in the Oracle Semantic Store include semantic networks, models, rulebases, and entailments. Functionality includes bulk loading, incremental loading, SPARQL querying, and built-in or user-defined inference rules.
RDF and other linked data standards — how to make use of big localization dataDave Lewis
The standards and interoperability challenge to using the Resource Description Framework for data resource in linked data. Based on work from CNGL (www.cngl.ie), the FALCON project (www.falcon-project.eu) and the LIDER project (www.lider-project.eu)
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsMarco Grassi
The document describes Pundit, a semantic annotation tool that allows users to create, explore, and consume semantic annotations. Pundit uses an annotation model based on the Open Annotation Collaboration specification. It allows users to organize annotations into notebooks and provides APIs to programmatically access and visualize the annotation data.
- FactForge is a semantic data service that provides access to a large collection of heterogeneous linked open data through inference and a reference ontology.
- It allows exploration of inferred knowledge through SPARQL queries, an RDF search, and relationship browsing.
- Challenges include cleaning input data, detecting contradictions, consistency checking, and curating and upgrading the methodology. FactForge has been used to generate linked data from unstructured sources and integrate metadata.
Generating Lexical Information for Terminologyin a Bioinformatics OntologyHammad Afzal
This document discusses generating lexical information for terms in a bioinformatics ontology. It proposes a model called LexInfo for associating linguistic information with ontologies. The authors lexicalize a bioinformatics ontology called myGrid by creating a LexInfo-based lexicon that captures morphological, syntactic and semantic properties of terms. They generate lexicons both semi-automatically using domain resources and automatically using LexInfo tools. The automatic lexicon has some errors due to POS tagging and tokenization issues that could be addressed using domain knowledge. The enriched ontology may help with automatic annotation of bioinformatics services.
The Datalift Project aims to publish and interconnect government open data. It develops tools and methodologies to transform raw datasets into interconnected semantic data. The project's first phase focuses on opening data by developing an infrastructure to ease publication. The second phase will validate the platform by publishing real datasets. The goal of Datalift is to move data from its raw published state to being fully interconnected on the Semantic Web.
Semantic Technologies and Triplestores for Business IntelligenceMarin Dimitrov
This document provides an introduction to semantic technologies and triplestores. It discusses the Semantic Web vision of making data on the web more accessible and linked. Key concepts covered include RDF, ontologies, OWL, SPARQL and Linked Data. It also introduces triplestores as RDF databases for storing and querying semantic data and compares their features to traditional databases.
This presentation is about:
- Introduction to OWL
- OWL Basics
- Class Expression Axioms
- Property Axioms
- Assertions
- Class Expressions -Propositional Connectives and Enumeration of Individuals
- Class Expressions -Property Restrictions
- Class Expressions -Cardinality Restrictions
The document provides an introduction to Dublin Core metadata, including:
1) Dublin Core is a set of metadata standards including 15 simple elements and over 50 qualified elements for describing resources.
2) Dublin Core metadata can be used to improve resource discovery and is recommended for metadata harvesting and the semantic web.
3) Custom mappings can be made from other metadata standards like LOM to the Dublin Core Abstract Model to make metadata interoperable.
1) Ontologies play a key role in semantic digital libraries by supporting bibliographic descriptions, extensible resource structures, and community-aware features.
2) Semantic digital libraries integrate information from various metadata sources and provide interoperability between systems using semantics.
3) Key ontologies for digital libraries include bibliographic ontologies, structure description ontologies, and community-aware ontologies that model folksonomies and social semantic collaborative filtering.
This document discusses using semantic web technologies to enhance digital libraries. It describes how ontologies like MarcOnt can lift legacy metadata into a semantic format to improve search and interoperability. The JeromeDL project is presented as a case study that uses MarcOnt and other ontologies to power semantic search and sharing features for bibliographic descriptions. Semantic technologies allow digital libraries to better integrate information and provide more robust, user-friendly search interfaces.
Part 4 of tutorials at DC2008, Berlin. (International Conference on Dublin Core and Metadata Applications). See also part 1-3 by Jane Greenberg, Pete Johnston, and Mikael Nilsson on DC history, concepts, and other schemas. This part focuses on practical issues.
JeromeDL is a digital library built on semantic web technologies that aims to integrate and interconnect information from different sources. It allows users to semantically search and browse content, and also contribute annotations and social metadata. JeromeDL supports various bibliographic metadata formats and delivers semantic services like semantic search, collaborative filtering, and communication between digital library instances. Evaluations found JeromeDL can complete some tasks up to 50% faster than other services by automating processes.
The document provides an overview of the work done at DERI Galway, including developing technologies like SIOC, ActiveRDF, and BrowseRDF to interconnect online communities and enable semantic applications. It also describes JeromeDL, a digital library system that uses semantic metadata and services to allow users to collaboratively browse and share knowledge.
This document discusses improving digital library usability through semantic and social technologies. It presents the problem statement that digital library users lack guidance from librarians and connections with other users and information sources. The hypothesis is that semantic and social technologies can improve information discovery over classic approaches. It outlines an architecture and ontologies for a semantic digital library called SemDL. It also describes a prototype semantic digital library system called JeromeDL and its features like advanced search, social networking, collaborative filtering and browsing. It evaluates JeromeDL against a classic digital library and finds JeromeDL has better usability and user experience.
The Dublin Core 1:1 Principle in the Age of Linked DataRichard Urban
Presentation given at the International Conference on Dublin Core and Metadata Applications, Austin, TX. October 9, 2014. See associated paper http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/263
JeromeDL is a social semantic digital library that allows users to:
1) Contribute metadata and annotations that are interconnected within the library and on the semantic web.
2) Perform personalized, collaborative search and browsing based on semantics.
3) Access and share knowledge through integrated social networking features and extensible access controls.
- The document discusses representing WordNet as linked data using RDF. It reviews previous approaches using RDF to represent WordNet and its relations.
- A key approach discussed is the Lemon model, an RDF model for representing lexical information and linking it to ontologies. The Lemon model represents the connections between a lexical entry, sense, form, and reference in an ontology.
- Representing WordNet in RDF and linking it to other semantic web resources allows WordNet to be accessible on the web and integrated with other applications in a standard way.
The document discusses NIF (NLP Interchange Format), which aims to integrate natural language processing (NLP) tools via a common output format. NIF addresses issues with current NLP pipelines by allowing tools to be combined ad-hoc. It represents NLP annotations as URIs in RDF to allow merging of output. Ontologies provide interfaces to integrate different layers of annotation. The goals are to make NLP component interchangeable and reuse past annotations.
The document discusses Oracle Semantic Technologies for storing and querying RDF data. It provides an overview of how RDF data is stored and organized in Oracle databases using ID triples and URI mapping tables. It describes how the SEM_MATCH SQL function allows querying RDF data using a SPARQL-like syntax. Optimization techniques for SEM_MATCH queries include indexes and materialized views. The core entities in the Oracle Semantic Store include semantic networks, models, rulebases, and entailments. Functionality includes bulk loading, incremental loading, SPARQL querying, and built-in or user-defined inference rules.
RDF and other linked data standards — how to make use of big localization dataDave Lewis
The standards and interoperability challenge to using the Resource Description Framework for data resource in linked data. Based on work from CNGL (www.cngl.ie), the FALCON project (www.falcon-project.eu) and the LIDER project (www.lider-project.eu)
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsMarco Grassi
The document describes Pundit, a semantic annotation tool that allows users to create, explore, and consume semantic annotations. Pundit uses an annotation model based on the Open Annotation Collaboration specification. It allows users to organize annotations into notebooks and provides APIs to programmatically access and visualize the annotation data.
- FactForge is a semantic data service that provides access to a large collection of heterogeneous linked open data through inference and a reference ontology.
- It allows exploration of inferred knowledge through SPARQL queries, an RDF search, and relationship browsing.
- Challenges include cleaning input data, detecting contradictions, consistency checking, and curating and upgrading the methodology. FactForge has been used to generate linked data from unstructured sources and integrate metadata.
Generating Lexical Information for Terminologyin a Bioinformatics OntologyHammad Afzal
This document discusses generating lexical information for terms in a bioinformatics ontology. It proposes a model called LexInfo for associating linguistic information with ontologies. The authors lexicalize a bioinformatics ontology called myGrid by creating a LexInfo-based lexicon that captures morphological, syntactic and semantic properties of terms. They generate lexicons both semi-automatically using domain resources and automatically using LexInfo tools. The automatic lexicon has some errors due to POS tagging and tokenization issues that could be addressed using domain knowledge. The enriched ontology may help with automatic annotation of bioinformatics services.
The Datalift Project aims to publish and interconnect government open data. It develops tools and methodologies to transform raw datasets into interconnected semantic data. The project's first phase focuses on opening data by developing an infrastructure to ease publication. The second phase will validate the platform by publishing real datasets. The goal of Datalift is to move data from its raw published state to being fully interconnected on the Semantic Web.
Semantic Technologies and Triplestores for Business IntelligenceMarin Dimitrov
This document provides an introduction to semantic technologies and triplestores. It discusses the Semantic Web vision of making data on the web more accessible and linked. Key concepts covered include RDF, ontologies, OWL, SPARQL and Linked Data. It also introduces triplestores as RDF databases for storing and querying semantic data and compares their features to traditional databases.
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetAndrea Nuzzolese
The document discusses transforming FrameNet, a lexical knowledge base, into Linked Open Data (LOD) and knowledge patterns. It presents several semantic issues with representing linguistic resources and proposes a two-step method using Semion to address these issues. The method first syntactically transforms FrameNet data into RDF triples, then applies a rule-based refactoring to add semantics. Ongoing work includes linking FrameNet to other LOD resources like WordNet and VerbNet. The transformation aims to publish FrameNet as a LOD dataset and convert its data into reusable knowledge patterns.
This document discusses modelling and representing social network data ontologically. It covers representing social individuals and relationships ontologically, as well as aggregating and reasoning with social network data. It discusses ontology languages like RDF, OWL, and FOAF that can be used to represent social network data and individuals semantically. It also talks about state-of-the-art approaches for representing network structure and attribute data, and the need for representations that can integrate different data sources and maintain identity.
Opening up MOOCs for OER management on the Web of linked dataGilbert Paquette
This document discusses managing open educational resources (OERs) using linked data and the semantic web. It proposes using the Resource Description Framework (RDF) model to address issues with OER adoption like the multiplicity of standards and profiles. COMÈTE, an OER repository manager developed by the author, uses RDF triples and complies with the ISO metadata standard to enable searching and indexing OERs across repositories. The author argues this approach could help personalize massive open online courses (MOOCs) by allowing learners to integrate alternative resources from OER repositories into their courses.
An Introduction to the IMS Learning Object Discovery and Exchange (LODE) Spec...David Massart
An introduction to the IMS Learning Object Discovery & Exchange (LODE) specification. The three LODE data model (1) Registry Data Model, (2) Information for Learning Object Exchange Data Model, and (3) LODE CQL Context Set are briefly presented.
This article advocates that information storage requirements should not be expressed in the form of data models or conceptual schemas, but database structures should allow for any expression in a general purpose language, whereas implementation constraints should be expressed as constraints on the use of the general purpose language.
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...Menzo Windhouwer
- The ISOcat Data Category Registry collaboratively defines widely accepted linguistic data categories in a standardized way.
- It provides persistent identifiers for each data category specification to support their reuse in language resources standards and applications.
- Data categories are organized by thematic domain groups and their specifications go through a validation and standardization procedure defined by ISO 12620.
Database Management Systems - Management Information SystemNijaz N
A DBMS is software that:Acts as an interface between application programs and the data files.Helps to reduce data redundancy and eliminate data inconsistency by allowing a central, shared data source
Multilingual issues in the representation of international bibliographic stan...Gordon Dunsire
The document discusses multilingual issues in representing international bibliographic standards for the semantic web. It outlines IFLA's standards for bibliographic data and its namespace used to represent the standards and vocabularies as RDF. Translating the standards into different languages exposed challenges regarding scope, style, source documentation, disambiguation, and language inflection. The presentation calls for authoritative translations of cataloguing standards and related documents in 26+ languages.
The IMLS-funded project Linked Data for Professional Education (LD4PE) has created a "Competency Index for Linked Data".
The Index provides a concise and readable map of concepts and skills related to the practices and technologies of Linked Data for the benefit of interested learners and their teachers.
The Web of Linked Open Data, or LOD, is the most relevant achievement of the Semantic Web. Initially proposed by Tim Berners-Lee in a seminal paper published in Scientific American in 2001, the Semantic Web envisions a web where software agents can interact with large volumes of structured, easy to process data. It is now when users have at our disposal the first, mature results of this vision. Among them, and probably the most significant ones, are the different LOD initiatives and projects that publish open data in standard formats like RDF.
This presentation provides an overview and comparison of different LOD initiatives in the area of patent information, and analyses potential opportunities for building new information services based on largely available datasets of patent information. Information is based on different interviews conducted with innovation agents and on the analysis of professional bibliography and current implementations.
LOD opportunities are not only restricted to information aggregators, but also to end-users and innovation agents that need to face with the difficulties of dealing with large amounts of data. In both cases, the opportunities offered by LOD need to be assessed, as LOD has just become a standard, universal method to distribute, share and access data.
The document summarizes several projects conducted by Microsoft Research related to scholarly communication. It discusses tools developed to aid scientific research through better data analysis, collaboration, dissemination of research outputs, and archiving of published literature and data. Specific projects highlighted include developing semantic markup and chemical drawing tools in Word 2007, integrating gene expression data with research papers using Word 2007's Open Packaging Conventions format, and establishing workflows for archiving datasets submitted with published articles.
The document defines key terms related to semantic technologies and the semantic web including:
- Linked Open Data (LOD) which publishes open data according to semantic web standards and links it to other sources to create a web of data.
- LOD2, an EU project developing infrastructure for building LOD.
- OWL, a language for more expressive semantic modeling.
- R2RML, a standard for mapping data in relational databases to RDF.
- RDF, the standard data model using triples to represent information.
The document discusses using semantic technologies like XML, RDF, and OWL to represent data on the web in a structured format that is accessible to machines. It describes two main approaches for accessing semantic data on the deep web: ontology plug-in search and deep web service annotation. Both approaches require a semantic web crawler or bot to harvest concepts from deep web forms and iteratively link them to build enriched ontologies that define domain terms and relationships to provide machine-interpretable meaning.
Similar to LDL 2012 - Linking to ISOcat Data Categories (20)
This document discusses CMD2RDF, a project that converts metadata in the Component Metadata Infrastructure (CMDI) format to the Resource Description Framework (RDF) format to make the metadata queryable as Linked Open Data. The conversion includes mapping the CMDI schema to RDFS classes and properties, mapping specific CMDI profiles and components to RDFS subclasses and subproperties, and converting individual CMDI records to RDF instances. The converted RDF data is loaded into a Virtuoso triplestore and made available for SPARQL and REST queries through a browser interface. This allows linking between CMDI metadata and other Linked Data sources.
This presentation gives an overview of: 1) Fedora Commons, 2) it's current use by CLARIN B centres, and 3) the new TLA/FLAT setup that meets the CLARIN B centre requirements using the Fedora Commons/Islandora stack.
ISOcat and RELcat, two cooperating semantic registriesMenzo Windhouwer
M. Windhouwer, I. Schuurman. ISOcat and RELcat, two cooperating semantic registries. At the 24th Meeting of Computational Linguistics in the Netherlands (CLIN 24), Leiden, The Netherlands, January 17, 2014.
The document discusses semantic mapping in CLARIN Component Metadata Infrastructure (CMDI). CMDI allows flexible yet semantically interoperable metadata descriptions through the use of explicit schemas and semantic registries like ISOcat and RelationRegistry. These registries define concepts and relationships that can be shared across metadata profiles and elements. Semantic mapping helps achieve recall and disambiguation in metadata searches across the diverse set of CMDI profiles and components.
This document proposes a CMD Core Model for describing CLARIN web services with consistent metadata. The model aims to align different profiles used by national CLARIN initiatives while allowing extensions. It specifies using a technical service description along with semantic metadata to allow profile matching and basic service invocation. The model represents services, operations, input/output parameters in a UML diagram and transforms it to a CMD component structure. Compliant parameter matching is supported on MIME type, data type, category and semantic type levels. Usage of the core model and areas for future work are discussed.
What do cats have to do with explicit semantics?Menzo Windhouwer
This document discusses how the ISOcat Data Category Registry and related registries can help make semantics explicit when using linguistic resources and tools. ISOcat defines elementary descriptors called data categories that can be shared between resources to clarify meaning. Related proposed registries include SCHEMAcat for sharing structures, RELcat for ontological relationships, and others, forming a semantic network to enable collaborative work. The goal is to make semantics persistent and resolve differences over time as meanings change.
This document provides a tutorial on the CMDI metadata standard and the ISOcat Data Category Registry. It discusses ISOcat's role as a registry for linguistic data categories, how data categories provide semantics for CMDI metadata elements and components, and how they are referenced in CMDI. It also provides an overview of the status and standardization process of the CMDI metadata profile.
This document discusses how ISOcat, a data category registry that implements ISO 12620, can be used within CMDI (Component Metadata Infrastructure) to provide standardized semantics for metadata elements and values. It describes how CMD components, elements, and items can link to standardized data category concepts in ISOcat to clarify their meaning. The status of the ISOcat metadata thematic domain group and standardization process is provided. Trust in the data category registry and individual data categories is addressed. Upcoming features for ISOcat like a user forum and improved standardization support are outlined.
ISOcat is a data category registry that allows users to create, share, and standardize data categories. It aims to provide consistent terminology for annotating linguistic resources. Data categories have an administrative part for identification info, a descriptive part for documentation in different languages, and a linguistic part for conceptual domains. The registry is overseen by a board and categories go through a standardization process involving thematic domain groups. ISOcat helps promote interoperability between linguistic resources and annotations by allowing resources to reference standardized data categories.
Sustainable operability: Keeping complex linguistic resources alive.Menzo Windhouwer
The document discusses the challenge of ensuring sustainable operability for complex linguistic resources like typological databases. It proposes mapping database contents and metadata to a standardized, self-describing format called IDDF when archiving. Generic software can then provide access to archived resources. The Typological Database System integrates multiple databases and maps their proprietary data models to IDDF to allow long-term usability. The TDS Curator project aims to establish this approach for typological databases within the CLARIN infrastructure.
Sustainable operability: Keeping complex linguistic resources alive.
LDL 2012 - Linking to ISOcat Data Categories
1. www.isocat.org
Linking to
Linguistic Data Categories
in ISOcat
Menzo Windhouwera, Sue Ellen Wrightb
aThe Language Archive - MPI for Psycholinguistics, bKent State University
menzo.windhouwer@mpi.nl, sellenwright@gmail.com
2. www.isocat.org
Outline
• A short introduction to data categories
– the ISOcat registry
• How to refer to ISOcat data categories
– using PIDs
– from XML and RDF resources
• Fine-tuning (personal) relationships between
data categories
– the RELcat registry
• Status
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 2
3. www.isocat.org
ISOcat: a Data Category Registry
• An implementation of ISO 12620:2009
– Terminology and other content and language resources —
Specification of data categories and management of a Data
Category Registry for language resources
• Successor to ISO 12620:1999 which contained a hardcoded list of
Data Categories
• A data category
– is the result of the specification of a given data field
– an elementary descriptor in a linguistic structure or an
annotation scheme
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 3
4. www.isocat.org
Data Category example
• Data category: /Grammatical gender/
– Administrative part:
• Identifier: grammaticalGender
• PID: http://www.isocat.org/datcat/DC-1297
– Descriptive part:
• English definition: Category based on (depending on languages)
the natural distinction between sex and formal criteria.
• French definition: Catégorie fondée (selon la langue) sur la
distinction naturelle entre les sexes ou d'autres critères formels.
– Conceptual domain:
• Morposyntax conceptual domain:
/masculine/, /feminine/, /neuter/, /common/
– Linguistic part:
• French conceptual domain: /masculine/, /feminine/
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 4
5. www.isocat.org
Data Category types
complex: open closed constrained
writtenForm grammaticalGender email
string string string
Constraint: .+@.+
neuter feminine
simple: masculine
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 5
6. www.isocat.org
Data Category types
container: lexicon
language alphabet entry
japanese ipa lemma
writtenForm
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 6
7. www.isocat.org
Data Category relationships
• Value domain membership
• Subsumption relationships partOfSpeech
between simple data string
categories (legacy)
pronoun
• Relationships between
complex/container data
categories are not stored in personal
the DCR pronoun
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 7
8. www.isocat.org
ISOcat: a Data Category Registry
• You can:
– Find Data Categories relevant for your resources and embed references to
them so the semantics of (parts of) your resources are made explicit
• This can be supported by tools you use, e.g., ELAN, LEXUS and the CMDI Component Editor
directly interact with ISOcat
– Interact with Data Category owners to improve (the coverage of) their Data
Categories
– Create (together with others) new Data Categories and/or selections needed
for your resources and share those
– Submit (your) Data Categories for standardization
• ISOcat is the DCR for ISO TC 37
– Free of charge
– Grass roots approach
www.isocat.org
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 8
9. www.isocat.org
The usage of data categories?
wordOrder grammaticalGender
Language BWO genders
Lexicon
1..*
A (schema for a) typological database
Lexical Entry partOfSpeech
writtenForm Lemma
1..* 0..*
Form Sense
writtenForm
0..*
grammaticalGender Word Form
lexicalType
A (schema for a) lexicon
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 9
10. www.isocat.org
Referencing Data Categories
• Each Data Category should be uniquely identifiable
– Ambiguity: different domains use the same term but mean different
‘things’
– Semantic rot: even in the same domain the meaning of a term
changes over time
– Persistence: for archived resources Data Category references should
still be resolvable and point to the specification as it was at/close to
time of creation
• Persistent IDentifiers
– ISO 24619:2011 Language resource management - Persistent
identification and sustainable access (PISA)
– ISOcat uses ‘cool URIs’:
• http://www.isocat.org/datcat/DC-1297 (/grammaticalGender/)
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 10
11. www.isocat.org
XML – DC Reference vocabulary
• ISO 12620:2009 is rather XML oriented
– why not RDF?
• history
– terminology management is a separate tradition from Semantic Web/Linked Data
– DCIF -> GMT (TMF) -> own XML vocabulary based on UML data model
• but there is an RDF representation
– needs to cover more of the data model
• Annex A provides the DC reference vocabulary
– dcr:datcat to link to any DC
– dcr:valueDatcat to link to a simple DC
www.isocat.org/12620/
• Preferably annotate a schema, e.g., a Relax NG or W3C XML Schema
documents
• XML vocabularies might also provide their own means to link to a data
category
– TBX XCS, TEI ODD, CMDI, ..., TEI (?)
• (Semantics by reference)
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 11
13. www.isocat.org
RDF – DC annotation property
• The dcr:datcat RDF annotation property mimics the DC
Reference vocabulary
– minimizes impact, i.e., allows the data model to use its own terminology
– can be tuned using OWL (2) equivalentClass, equivalentPropery or sameAs
– problem: annotating literals with simple Data Categories (names can be
ambiguous)
@prefix dcr: <http://www.isocat.org/ns/dcr.rdf#> .
:headword dcr:datcat <http://www.isocat.org/datcat/DC-258> ;
rdfs:label "head word"@en ;
rdfs:comment "A lemma heading a dictionary entry."@en .
:partOfSpeech dcr:datcat <http://www.isocat.org/datcat/DC-396> ;
rdfs:label "part of speech"@en ;
rdfs:comment "A category assigned to a word based on its
grammatical and semantic properties."@en .
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 13
14. www.isocat.org
RDF – directly use Data Category PIDs
• Container Data Categories as RDF classes
• Complex Data Categories as RDF properties
• Simple Data Categories
– as RDF literals
• problem: names can be ambiguous
– as RDF classes
• (GrAF example <f name=“” val=“.../DC-3581”/> vs <f name=“” val=“plural noun”
dcr:datcat=“.../DC-3581”/>)
@prefix cat: <http://www.isocat.org/datcat/> .
cat:DC-258 rdfs:label "head word"@en ;
rdfs:comment "A lemma heading a dictionary entry."@en .
cat:DC-396 rdfs:label "part of speech"@en ;
rdfs:comment "A category assigned to a word based on its
grammatical and semantic properties."@en .
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 14
15. www.isocat.org
Data Category Relations
• In the linked data world its natural to
have, next to structural, ontological
relationships
– RDFS, OWL (2), SKOS, ...
• But other resource/schema formats lack these
features
• Relationships between Data Categories (also
across vocabularies) are important for
federated search, i.e., to find semantically
related resources in another archive
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 15
16. www.isocat.org
RELcat a Relation Registry
• Stores relationships among Data Categories and also with ‘other’ concept
registries
– Dublin Core, OLAC, GOLD
– (OLiA, OntoLingAnnot)
– relationships can be the individual view of a (group of) linguist(s)
• RELcat is a quad store (graph, subject, predicate, object)
• Based on a ‘private’ relation type taxonomy so existing relationships
specified in other vocabularies can easily be loaded
– OWL (2), SKOS
– normalized RELcat queries
• The aim is to support various levels of traversing the semantic
network, not formal reasoning
– conflicting (theoretical) views
• (parameters of variation)
– but within known combination of sets reasoning may well be possible
– also targets semantic search outside of the RDF domain
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 16
17. www.isocat.org
Relation type taxonomy
1. related
1. same as (a symmetric and transitive relationship)
2. almost same as (a symmetric relationship)
3. broader than (a transitive relationship and the inverse of the
’narrower than’ relationship)
1. superclass of (a transitive relationship and the inverse of the ’subclass of’
relationship)
2. has part (a transitive relationship and the inverse of the ’part of’
relationship)
1. has direct part (the inverse of the ’direct part of’ relationship)
4. narrower than (a transitive relationship and the inverse of the
’broader than’ relationship)
1. sub class of (a transitive relationship and the inverse of the ’super class of’
relationship)
2. part of (a transitive relationship and the inverse of the ’has part’
relationship)
1. direct part of (the inverse of the ’has direct part’ relationship)
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 17
19. www.isocat.org
Extension
1. related
1. same as (a symmetric and transitive relationship)
1. owl:equivalentClass
2. owl:equivalentProperty
3. owl:sameAs
4. skos:exactMatch
2. almost same as (a symmetric relationship)
1. skos:closeMatch
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 19
20. www.isocat.org
Normalized query
PREFIX rel:<http://www.isocat.org/relcat/relations#>
PREFIX cat:<http://www.isocat.org/datcat/>
SELECT ?c WHERE { cat:DC-2482 rel:sameAs ?c . }
• Finds the same-as clique for /languageID/ (DC-2482)
specified in any vocabulary, e.g., RELcat (CMDI) for
Dublin Core and annotated OWL for GOLD
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 20
21. www.isocat.org
Semantic network
Linguistic resource (schema) Linguistic knowledge base
Data categories
Containers
Concepts
Relation
Schema Registry - SCHEMAcat
Data Category Registry - ISOcat Concept Registry Relation Registry - RELcat
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 21
22. www.isocat.org
Status
• ISOcat: in production, mainly lacking in
standardization
– http://www.isocat.org/
• RELcat: alpha version gives read only access to
some relation sets, lacking some reasoning
and UI
– http://lux13.mpi.nl/isocat/relcat/
• SCHEMAcat: design phase
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 22
23. www.isocat.org
Thank you for your attention!
Visit
www.isocat.org
Questions?
www.isocat.org/forum/
isocat@mpi.nl
7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 23