This document discusses verifying the integrity constraints of the Portuguese WordNet (OpenWordnet-PT) against the ontology for encoding wordnets. It was the first attempt to check correctness and improve the linguistic data by correcting errors found. Various types of errors were discovered, including datatype errors, domain and range errors, and structural errors. Explanations provided by reasoning tools helped identify and fix issues, improving the overall quality and accuracy of the OpenWordnet-PT resource.
Abstract:
An increasing number of applications rely on RDF, OWL 2, and SPARQL for storing and querying data. SPARQL, however, is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach for end-user data access, and several RDF-based faceted search systems have been developed. There is, however, a lack of rigorous theoretical underpinning for faceted search in the context of RDF and OWL 2. In this paper, we provide such solid foundations. We formalise faceted interfaces for this context, identify a fragment of first-order logic capturing the underlying queries, and study the complexity of answering such queries for RDF and OWL 2 profiles. We then study interface generation and update, and devise efficiently implementable algorithms. Finally, we have implemented and tested our faceted search algorithms for scalability, with encouraging results.
RDF is a general method to decompose knowledge into small pieces, with some rules about the semantics or meaning of those pieces. The point is to have a method so simple that it can express any fact, and yet so structured that computer applications can do useful things with knowledge expressed in RDF.
Lightening talk for Semantic Web in Libraries (SWIB13) conference at 2013-11-27 about another method of expressing RDF data. See http://gbv.github.io/aREF/ for a preliminary specification.
Abstract:
An increasing number of applications rely on RDF, OWL 2, and SPARQL for storing and querying data. SPARQL, however, is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach for end-user data access, and several RDF-based faceted search systems have been developed. There is, however, a lack of rigorous theoretical underpinning for faceted search in the context of RDF and OWL 2. In this paper, we provide such solid foundations. We formalise faceted interfaces for this context, identify a fragment of first-order logic capturing the underlying queries, and study the complexity of answering such queries for RDF and OWL 2 profiles. We then study interface generation and update, and devise efficiently implementable algorithms. Finally, we have implemented and tested our faceted search algorithms for scalability, with encouraging results.
RDF is a general method to decompose knowledge into small pieces, with some rules about the semantics or meaning of those pieces. The point is to have a method so simple that it can express any fact, and yet so structured that computer applications can do useful things with knowledge expressed in RDF.
Lightening talk for Semantic Web in Libraries (SWIB13) conference at 2013-11-27 about another method of expressing RDF data. See http://gbv.github.io/aREF/ for a preliminary specification.
Matching and merging anonymous terms from web sourcesIJwest
This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated i
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
What is the fuzz on triple stores? Will triple stores eventually replace relational databases? This talk looks at the big picture, explains the technology and tries to look at the road ahead.
FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte
The final slides of our talk about FedX at the 10th International Semantic Web Conference in Bonn. For details about FedX see http://www.fluidops.com/fedx/
Introducing FRSAD and Mapping it with Other ModelsMarcia Zeng
Report on the work of the IFLA FRSAR (Functional Requirements for Subject Authority Records) Working Group.
1. Introducing the FRSAD model.
2. Mapping to other models (BS 8723 and ISO 25964, SKOS, OWL, & DCMI-AM).
(Presented at IFLA2009 Milan, 08, 2009 by Marcia Zeng and Maja Zumer)
Paper and FRSAD report available at:http://nkos.slis.kent.edu/FRSAR/index.html
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at WSDM 2014 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
A linked open data architecture for contemporary historical archivesAlexandre Rademaker
This presentation presents an architecture for historical archives maintenance based on Open Linked Data technologies and open source distributed development model and tools. The proposed architecture is being implemented for the archives of the Center for Teaching and Research in the Social Sciences and Contemporary History of Brazil (CPDOC) from Getulio Vargas Foundation (FGV).
Matching and merging anonymous terms from web sourcesIJwest
This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated i
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
What is the fuzz on triple stores? Will triple stores eventually replace relational databases? This talk looks at the big picture, explains the technology and tries to look at the road ahead.
FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte
The final slides of our talk about FedX at the 10th International Semantic Web Conference in Bonn. For details about FedX see http://www.fluidops.com/fedx/
Introducing FRSAD and Mapping it with Other ModelsMarcia Zeng
Report on the work of the IFLA FRSAR (Functional Requirements for Subject Authority Records) Working Group.
1. Introducing the FRSAD model.
2. Mapping to other models (BS 8723 and ISO 25964, SKOS, OWL, & DCMI-AM).
(Presented at IFLA2009 Milan, 08, 2009 by Marcia Zeng and Maja Zumer)
Paper and FRSAD report available at:http://nkos.slis.kent.edu/FRSAR/index.html
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at WSDM 2014 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
A linked open data architecture for contemporary historical archivesAlexandre Rademaker
This presentation presents an architecture for historical archives maintenance based on Open Linked Data technologies and open source distributed development model and tools. The proposed architecture is being implemented for the archives of the Center for Teaching and Research in the Social Sciences and Contemporary History of Brazil (CPDOC) from Getulio Vargas Foundation (FGV).
For the seasoned or new to WordPress developer this session will discus the basics of setting up WordPress using WPI (Web Platform Installer). We will walk through the basic WPI setup, WordPress installation, Db configuration and general setup procedures on your localhost.
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Dr.-Ing. Thomas Hartmann
In this thesis, a validation framework is introduced that enables to consistently execute RDF-based constraint languages on RDF data and to formulate constraints of any type. The framework reduces the representation of constraints to the absolute minimum, is based on formal logics, consists of a small lightweight vocabulary, and ensures consistency regarding validation results and enables constraint transformations for each constraint type across RDF-based constraint languages.
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Laura Akerman
Presentation for the CNI (Coalition for Networked Information) Fall Forum, December 2012. Describes Emory University Library’s first-hand experience in interlinking Civil War-related materials and other online resources by leveraging open linked data principles. The library has been actively evaluating linked data’s potential to replace current library processes and services (bibliographic services, finding aids, cataloging, and metadata work) as a more efficient and sustainable means, and one that could bring greater benefit to end users for research and learning. The Library’s initial focus was on workforce education and hands-on learning through real-time experiments: the Connections project was begun to prepare staff to work with linked data, a process that has culminated in a 3-month hands-on pilot to build and convert some data. The pilot introduced the concept to a wide range of staff, including subject liaisons, archivists, metadata librarians, and programmers. Emory’s “silos” of data were interlinked with other open data sources as a way to enhance user discovery and use of library materials on a very limited scale.
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
An electronic laboratory Notebook (ELN) can be characterized as a system that allows scientists to capture the data and resources used in performing scientific experiments. This allows users to easily organize and find their data however, little information about the scientific process is recorded.
In this paper we highlight the current status of progress toward semantic representation of science in ELNs.
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Marko Rodriguez
• The Semantic Web is a distributed, flexible modeling framework.
• The Semantic Web is primarily descriptive in nature. The Semantic Web is used to describe web-pages, services, systems, etc.
• Neno is an object-oriented language that was designed specifically for the Semantic Web.
• Fhat is a virtual machine represented in the Semantic Web.
• With Neno/Fhat the Semantic Web now has a procedural component. The Semantic Web now includes object methods, algorithms, and computing machines.
• The Semantic Web can be made to behave like a distributed, general-purpose computer. Not just an information repository.
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesDaniel Sonntag
We implemented a generic dialogue shell that can be configured for and applied to domain-specific dialogue applications. The dialogue system works robustly for a new domain when the application backend can automatically infer previously unknown knowledge (facts) and provide explanations for the inference steps involved. For this purpose, we employ URDF, a query engine for uncertain and potentially inconsistent RDF knowledge bases. URDF supports rule-based, first-order predicate logic as used in OWL-Lite and OWL-DL, with simple and effective top-down reasoning capabilities. This mechanism also generates explanation graphs. These graphs can then be displayed in the GUI of the dialogue shell and help the user understand the underlying reasoning processes. We believe that proper explanations are a main factor for increasing the level of user trust in end-to-end human-computer interaction systems.
These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at http://www.slideshare.net/LeeFeigenbaum/semantic-web-landscape-2009 .
A PDF version of the slides is available at http://thefigtrees.net/lee/sw/cshals/cshals-w3c-semantic-web-tutorial.pdf .
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
ISI 2024: Application Form (Extended), Exam Date (Out), EligibilitySciAstra
The Indian Statistical Institute (ISI) has extended its application deadline for 2024 admissions to April 2. Known for its excellence in statistics and related fields, ISI offers a range of programs from Bachelor's to Junior Research Fellowships. The admission test is scheduled for May 12, 2024. Eligibility varies by program, generally requiring a background in Mathematics and English for undergraduate courses and specific degrees for postgraduate and research positions. Application fees are ₹1500 for male general category applicants and ₹1000 for females. Applications are open to Indian and OCI candidates.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Deep Software Variability and Frictionless Reproducibility
Verifying Integrity Constraints of a RDF-based WordNet
1. Verifying Integrity Constraints of a RDF-based
WordNet
Fabricio Chalub
Alexandre Rademaker
IBM Research, Brazil
January 30, 2016
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 1 / 21
2. OpenWordnet-PT
http://wnpt.brlcloud.com/wn/
Goal: not a simple translation of PWN, based on PWN architecture.
originally created from a (PT) projection of the Universal WordNet
(Gerard de Melo)
Three language strategies in its lexical enrichment process: (i)
translation; (ii) corpus extraction; (iii) dictionaries.
Corpora: AC/DC project, DHBB CPDOC/FGV etc.
LR: morphosemantic links, nominalizations from NomLex, Nomage and
Wiktionary etc.
Freely available since Dec 2011. Download as RDF files, query via
SPARQL or browse via web interface (above).
used by “Google Translate”, FreeLing, OMW, BabelNet and Onto.PT.
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 2 / 21
3. Why RDF
<definition gloss="This is my definition">
<meta creator="me/"></definition>
<definition><meta creator="me"/>
This is my definition</definition>
<definition><meta creator="me"/>
<text>This is my definition</text>
</definition>
<lmf:definition meta:creator="me">
This is my definition</lmf:definition>
“Why RDF model is different from the XML model” by Tim Berners-Lee
(1998). http://www.w3.org/DesignIssues/RDF-XML.html
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 3 / 21
4. Why RDF (cont.)
There is a mapping from XML documents to semantic graphs.
The element names were a big hint for a human reader.
Without the schema (DTD, XML Schema), you know things about
the doc structure, but nothing else. You can’t tell what to deduce.
You can’t even really tell what real questions can be asked.
(1) mapping is many to one; (2) you need a schema to know what
the mapping is (don’t have a inference language); (3) the expression
you need for querying something in terms of the XML tree is
necessarily more complicated than the expression you need for
querying something in terms of the RDF tree.
“give me the properties with the same metadatas of this one?”
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 4 / 21
5. Why RDF (cont.)
Giving a machine a knowledge tree vs. giving a person a document.
A document for a person is generally serialized so that, when read
serially by a human being, the result will be to build up a graph of
associations in that person’s head. The order is important.
For a graph of knowledge, order is not important, so long as the nodes
in common between different statements are identified consistently.
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 5 / 21
6. Linked Data
Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful information, using the
standards (RDF, SPARQL)
Include links to other URIs. so that they can discover more things
(ILI?)
A number of Linked Data projects for lexical resources.
“Linked Data” by Tim Berners-Lee (2006).
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 6 / 21
7. Some words about vocabularies
To encode data, we need to decide which classes and properties to
use!
Different vocabularies for RDF encoding wordnets!
The adoption of already defined vocabularies helps on the data
interoperability since these makes data easily integrate with other
resources.
We use http://www.w3.org/TR/wordnet-rdf/ from 2006.
Scripts available http://github.com/own-pt/.
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 7 / 21
9. This paper
Our first attempt at verifying integrity constraints of our openWordnet-PT
against the ontology for Wordnets encoding.
Correcting and improving linguistic data is a hard task.
So far, no clear criteria for semantic evaluation wordnets not ways of
comparing their relative quality or accuracy. Thus qualitative assessment
of a new wordnet seems, presently, a matter of judgment and art.
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 9 / 21
10. OWL and RDF
Consistency check of OWL and Integrity Constraints in RDF
OWL Lite and OWL DL semantics are based on Description logics.
DL are a family of logics that are decidable fragments of first-order
logic with attractive and well-understood computational properties.
A DL knowledge base is comprised by two components, TBox and
ABox. The TBox contains intensional knowledge (terminology).
The ABox contains extensional knowledge (assertional).
Intensional knowledge is usually thought not to change and
extensional knowledge is usually thought to be contingent.
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 10 / 21
11. Reasoning
Given an ontology encoded in OWL (Lite or DL) one can use DL reasoners
for different tasks such as: concepts consistency checking, query
answering, classification, etc.
The basic reasoning task in an ABox is instance checking, which verifies
whether a given individ- ual is an instance of (or belongs to) a specified
concept.
In some use cases, we want a method to validating the RDF data
regarding a given model. In this case, OWL users intend OWL axioms to
be interpreted as constraints on RDF data.
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 11 / 21
12. CWA vs. OWA
OWL default semantics adopts the Open World Assumption (OWA) and
does not adopt the Unique Name Assumption (UNA).
Due to OWA, a statement must not be inferred to be false on the basis of
failures to prove it; the fact that a piece of information has not been
specified does not mean that such information does not exist.
On the other hand, the absence of UNA allows two different constants to
refer to the same individual.
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 12 / 21
13. Queries in SPARQL
One more motivation for RDF:
select ?w ?ws1 ?ws2
{
?ss1 wn30:containsWordSense ?ws1 .
?ws1 wn30:word ?w .
?ss2 wn30:containsWordSense ?ws2 .
?ws2 wn30:word ?w .
?ss1 wn30:hyponymOf* ?ss2 .
}
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 13 / 21
14. Tools
Protege is an ontology editor that among other features has interface with
two well-know DL reasoners: FaCT++, HermiT etc.
Starting in version 4, Protege also gives us an interface to search for
explanations that caused an inconsistency. Racer and Pellet are reasoners
that have this feature builtin.
RDFpro for combine, split and syntax check etc.
Stardog and Allegro Graph triplestores. Stardog has ICC included, AG has
RDF++ semantics.
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 14 / 21
15. Errors
Errors found can be categorized in three different classes: datatype errors,
domain and range errors, structural errors.
Missing classes and properties definitions. We improved the OWL file.
wn30:AdjectiveWordSense rdfs:subClassOf
wn30:WordSense .
Literal values with types.
Literal value "00113726" does not
belong to datatype nonNegativeInteger
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 15 / 21
16. Errors
current account is the label of wordsense-13363970-n-3 and Britain
the label of wordsense-08860123-n-4.
Explanation for:
Thing SubClassOf Nothing
classifiedByRegion Domain Synset
current_account classifiedByRegion Britain
current_account Type WordSense
Synset DisjointWith WordSense
wordsense-13363970-n-3 classifiedByRegion
wordsense-08860123-n-4
“The following pointer types are usually used to indicate lexical
relations: Antonym, Pertainym, Participle, Also See, Derivationally
Related. The remaining pointer types are generally used to represent
semantic relations.”
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 16 / 21
17. Fixing Errors
wn30:classifiedByRegion
a rdf:Property, owl:ObjectProperty ;
rdfs:domain wn30:Synset ;
rdfs:range wn30:NounSynset ;
rdfs:subPropertyOf wn30:classifiedBy .
Updated to:
wn30:classifiedByRegion
a rdf:Property, owl:ObjectProperty ;
rdfs:subPropertyOf wn30:classifiedBy ;
rdfs:range [ a owl:Class ;
owl:unionOf (wn30:NounWordSense
wn30:NounSynset)] ;
rdfs:domain [ a owl:Class ;
owl:unionOf (wn30:WordSense wn30:Synset)] .
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 17 / 21
18. Proofs Explanations
In formal verifications, the complexity of the proofs/explanations. This is
the explanation found for the issue:
synset-01345109-v hypernymOf
synset-01220528-v
VerbWordSense subClassOf WordSense
frame domain VerbWordSense
synset-01220528-v frame
"Somebody ----s something"
hypernymOf range Synset
Synset disjointWith WordSense
synset-01220528-v found to be of type ’Synset’ due to it is the object of a
triple with predicate wn30:hypernymOf combined with the range of this predicate
is the set of all synsets.
synset-01220528-v is a verb synset and that verb synsets are a subset of
synsets.
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 18 / 21
19. More
Two invalid situations: (a) two or more words associated to a single word
sense subject; (b) two or more lexical forms associated to a single word
subject.
wordsense-01860795-v-2 type WordSense
word-deixar lexicalForm "deixar"@pt
word-parar lexicalForm "parar"@pt
wordsense-01860795-v-2 word word-deixar
Word subClassOf lexicalForm exactly 1
wordsense-01860795-v-2 word word-parar
word-deixar type Word
word-parar type Word
WordSense subClassOf word exactly 1 Word
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 19 / 21
20. More
Stardog is the only reasoner and database system that supports ICV.
Under the ICV semantics, the axioms below from the wn30:WordSense
class were taken as constraints rather than terminology definitions.
Finding an instance of the class wn30:WordSense connected to more than
one instance of wn30:Word, it will raise an exception instead of infer that
the two different wn30:Word instances should be the same.
wn30:WordSense
a rdfs:Class, owl:Class ;
rdfs:subClassOf [
a owl:Restriction ;
owl:onProperty wn30:inSynset ;
owl:qualifiedCardinality
"1"^^xsd:nonNegativeInteger ;
owl:onClass wn30:Synset ], [
a owl:Restriction ;
owl:onProperty wn30:word;
owl:qualifiedCardinality
"1"^^xsd:nonNegativeInteger ;
owl:onClass wn30:Word ] .
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 20 / 21
21. Conclusions
Linguistic resources are very easy to start, hard to improve and
extremely difficult to maintain.
Size of lexical resources are easy to compare, quality is hard.
A lot of (old? already used/defined?) verifications can be encoded in
OWL axioms. Some of them may require more expressivity (SUMO?)
Rademaker, Chalub (IBM Research) ICC on OpenWordnet-PT January 30, 2016 21 / 21