The document discusses conceptual similarity and how, where, and why it is measured. It provides examples of measuring similarity between different types of objects, such as words, concepts, and hierarchies. It also summarizes various methods used to measure similarity, including precision and recall used in information retrieval, string matching measures to compare lexical terms, and measures to compare conceptual structures by examining hierarchies and semantic relationships between concepts.
Proof-Theoretic Semantics: Point-free meaninig of first-order systemsMarco Benini
This document summarizes a talk on providing a semantics for first-order logical theories using logical categories. The semantics interprets formulae as objects in a category and proofs as morphisms, without assuming elements exist. Quantifiers are interpreted using stars and costars. A logical category is a prelogical category where stars and costars exist to interpret all formulae. This semantics is sound and complete - a formula is true if a proof morphism exists. The semantics can interpret many other approaches and inconsistent theories have "trivial" models.
This document proposes a standardized logic notation for use in mathematics classrooms. It describes using the symbols "→" and "↔" as connectives that can return true or false values, and "⟹" and "⟺" only when the corresponding connectives always return true. This addresses inconsistencies in current logic notation usage. Examples are given showing how the standardized notation clarifies logical relationships in calculus, algebra, and other mathematics topics. Adopting this standardized notation is argued to benefit students by making logic portable between courses and instructors.
Introduction to Distributional SemanticsAndre Freitas
This document provides an introduction to distributional semantics. It discusses how distributional semantic models (DSMs) represent word meanings as vectors based on their linguistic contexts in large corpora. This distributional hypothesis states that words that appear in similar contexts tend to have similar meanings. The document outlines how DSMs are built, important parameters like context type and weighting, and examples like latent semantic analysis. It also discusses how DSMs can support applications like semantic search. Finally, it introduces how compositional semantics explores representing the meanings of phrases and sentences compositionally based on the meanings of their parts.
What is concept dirft and how to measure it?Shenghui Wang
The document discusses concept drift in knowledge organization systems. It presents a theory of concept drift that defines the meaning of a concept using intension, extension, and label. It also defines identity, drift, and shift. The document then describes case studies on concept drift in political communication vocabularies, DBpedia, and the LKIF-Core ontology. It analyzes the stability of concepts and identifies examples of concept drift and shift in each case study.
Space Efficient Suffix Array Construction using Induced Sorting LMS Substringsijistjournal
This paper presents, an space efficient algorithm for linear time suffix array construction. The algorithm uses the techniques of divide-and-conquer, and recursion. What differentiates the proposed algorithm from the variable-length leftmost S-type (LMS) substrings is the efficient usage of the memory to construct the suffix array. The modified induced sorting algorithm for the variable-length LMS substrings uses efficient usage of the memory space than the existing variable length left most S-type(LMS) substrings algorithm
Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings ijistjournal
This paper presents, an space efficient algorithm for linear time suffix array construction. The algorithm uses the techniques of divide-and-conquer, and recursion. What differentiates the proposed algorithm from the variable-length leftmost S-type (LMS) substrings is the efficient usage of the memory to construct the suffix array. The modified induced sorting algorithm for the variable-length LMS substrings uses efficient usage of the memory space than the existing variable length left most S-type(LMS) substrings algorithm
Distributional semantics is a research area that uses statistical analysis of linguistic contexts to develop theories and methods for determining the semantic similarities between words and linguistic items based on their distributional properties in large text corpora. It is based on the distributional hypothesis that words with similar distributions have similar meanings. Distributional semantic models represent words as vectors in a high-dimensional semantic space based on their co-occurrence with other words, allowing semantic similarity to be measured using vector similarity methods. Common distributional semantic models include term frequency-inverse document frequency (tf-idf), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and word embeddings.
Proof-Theoretic Semantics: Point-free meaninig of first-order systemsMarco Benini
This document summarizes a talk on providing a semantics for first-order logical theories using logical categories. The semantics interprets formulae as objects in a category and proofs as morphisms, without assuming elements exist. Quantifiers are interpreted using stars and costars. A logical category is a prelogical category where stars and costars exist to interpret all formulae. This semantics is sound and complete - a formula is true if a proof morphism exists. The semantics can interpret many other approaches and inconsistent theories have "trivial" models.
This document proposes a standardized logic notation for use in mathematics classrooms. It describes using the symbols "→" and "↔" as connectives that can return true or false values, and "⟹" and "⟺" only when the corresponding connectives always return true. This addresses inconsistencies in current logic notation usage. Examples are given showing how the standardized notation clarifies logical relationships in calculus, algebra, and other mathematics topics. Adopting this standardized notation is argued to benefit students by making logic portable between courses and instructors.
Introduction to Distributional SemanticsAndre Freitas
This document provides an introduction to distributional semantics. It discusses how distributional semantic models (DSMs) represent word meanings as vectors based on their linguistic contexts in large corpora. This distributional hypothesis states that words that appear in similar contexts tend to have similar meanings. The document outlines how DSMs are built, important parameters like context type and weighting, and examples like latent semantic analysis. It also discusses how DSMs can support applications like semantic search. Finally, it introduces how compositional semantics explores representing the meanings of phrases and sentences compositionally based on the meanings of their parts.
What is concept dirft and how to measure it?Shenghui Wang
The document discusses concept drift in knowledge organization systems. It presents a theory of concept drift that defines the meaning of a concept using intension, extension, and label. It also defines identity, drift, and shift. The document then describes case studies on concept drift in political communication vocabularies, DBpedia, and the LKIF-Core ontology. It analyzes the stability of concepts and identifies examples of concept drift and shift in each case study.
Space Efficient Suffix Array Construction using Induced Sorting LMS Substringsijistjournal
This paper presents, an space efficient algorithm for linear time suffix array construction. The algorithm uses the techniques of divide-and-conquer, and recursion. What differentiates the proposed algorithm from the variable-length leftmost S-type (LMS) substrings is the efficient usage of the memory to construct the suffix array. The modified induced sorting algorithm for the variable-length LMS substrings uses efficient usage of the memory space than the existing variable length left most S-type(LMS) substrings algorithm
Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings ijistjournal
This paper presents, an space efficient algorithm for linear time suffix array construction. The algorithm uses the techniques of divide-and-conquer, and recursion. What differentiates the proposed algorithm from the variable-length leftmost S-type (LMS) substrings is the efficient usage of the memory to construct the suffix array. The modified induced sorting algorithm for the variable-length LMS substrings uses efficient usage of the memory space than the existing variable length left most S-type(LMS) substrings algorithm
Distributional semantics is a research area that uses statistical analysis of linguistic contexts to develop theories and methods for determining the semantic similarities between words and linguistic items based on their distributional properties in large text corpora. It is based on the distributional hypothesis that words with similar distributions have similar meanings. Distributional semantic models represent words as vectors in a high-dimensional semantic space based on their co-occurrence with other words, allowing semantic similarity to be measured using vector similarity methods. Common distributional semantic models include term frequency-inverse document frequency (tf-idf), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and word embeddings.
This document discusses various techniques for document clustering and retrieval, including cosine similarity, k-means clustering, hierarchical clustering, and the EM algorithm. Cosine similarity measures the similarity between document vectors based on the angle between them. K-means clustering partitions documents into k clusters to minimize intra-cluster similarity, while hierarchical clustering merges clusters in a dendogram based on similarity. The EM algorithm computes maximum likelihood estimates of document distributions. Evaluation of clustering assesses the quality based on intra-class and inter-class similarity.
This document discusses various techniques for document clustering and retrieval, including cosine similarity, k-means clustering, hierarchical clustering, and the EM algorithm. Cosine similarity measures the similarity between document vectors and is often used to compare documents, with higher values indicating more similar documents. K-means clustering partitions documents into k groups to minimize intra-cluster similarity, while hierarchical clustering creates a dendrogram of document clusters by progressively merging the most similar pairs. The EM algorithm computes maximum likelihood estimates for document clustering when data is incomplete. Evaluation of document clusters considers internal metrics like intra-cluster similarity and inter-cluster dissimilarity.
This document discusses various techniques for document clustering and retrieval, including cosine similarity, k-means clustering, hierarchical clustering, and the EM algorithm. Cosine similarity measures the similarity between document vectors and is often used to compare documents, with higher values indicating more similar documents. K-means clustering partitions documents into k groups to minimize intra-cluster similarity, while hierarchical clustering creates a dendrogram of document clusters by progressively merging the most similar groups. The EM algorithm computes maximum likelihood estimates for document clustering when data is incomplete. Evaluation of document clusters considers internal metrics like intra-cluster similarity and inter-cluster dissimilarity.
Semantic Web technologies are a set of languages standardized by the World Wide Web Consortium (W3C) and designed to create a web of data that can be processed by machines. One of the core languages of the Semantic Web is Web Ontology Language (OWL), a family of knowledge representation languages for authoring ontologies or knowledge bases. The newest OWL is based on Description Logics (DL), a family of logics that are decidable fragments of first-order logic. leanCoR is a new description logic reasoner designed for experimenting with the new connection method algorithms and optimization techniques for DL. leanCoR is an extension of leanCoP, a compact automated theorem prover for classical first-order logic.
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
Tutorial at the "Reality of the Semantic Gap in Image Retrieval" tutorial at the first international conference on Semantics And digital Media Technology (SAMT 2006). 6th December 2006.
This document presents a novel approach for measuring shape similarity and using it for object recognition. The key steps are:
1) Solving the correspondence problem between two shapes by attaching a descriptor called "shape context" to sample points on each shape. Shape context captures the distribution of remaining points relative to the reference point.
2) Using the point correspondences to estimate an aligning transformation between the shapes. This provides a measure of shape similarity as the matching error between corresponding points plus the magnitude of the transformation.
3) Treating recognition as a nearest neighbor problem to find the most similar stored prototype shape. The approach is demonstrated on various datasets including handwritten digits, silhouettes, and 3D objects
A survey on parallel corpora alignment andrefsantos
This document provides a survey of methods for aligning parallel text corpora. It discusses the historical background of using parallel texts in language processing from the 1950s onward. Key early methods are described, including ones based on sentence length, lexical mapping between words, and identifying cognates. The document also evaluates major efforts to create benchmark datasets and evaluate system performance against gold standard alignments. It surveys the evolution of various alignment techniques and lists some relevant tools and projects in the field.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
This document discusses methods for calculating semantic similarity between terms in ontologies. It summarizes the Wu-Palmer algorithm, which calculates similarity based on the depth of terms from their closest common ancestor. The document also describes a modified "tbk" algorithm that adds a penalization factor for terms in neighboring hierarchies to address limitations of Wu-Palmer. The paper proposes a new algorithm that calculates similarity based on the direct distance between terms rather than their distance from the root node. It argues this approach could provide better results than existing edge-based algorithms like Wu-Palmer and tbk.
The document summarizes and compares schema matching and ontology mapping. It discusses how schema matching approaches can be applied to ontology mapping given the similarities between schemas and ontologies. The document outlines different categories of schema matching techniques (element-based, structure-based) and provides examples. It also summarizes several ontology mapping tools and approaches that utilize different matching strategies like string, structure, and semantic similarity.
Information Retrieval using Semantic SimilaritySaswat Padhi
This document summarizes a seminar on artificial intelligence that covered three main topics: information retrieval using semantics and ontology, semantic similarity, and information retrieval. It discusses how semantics and ontologies can help address what information retrieval is currently lacking by providing meaning. It then covers different approaches to measuring semantic similarity based on path lengths and information content in ontologies. Finally, it discusses how information retrieval can be improved by reweighting query terms and expanding queries based on semantic similarity to related terms.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
This paper proposes a system to score how well an image matches a sentence and vice versa. It represents images and sentences as triplets of objects, actions, and scenes in a shared meaning space. Features from detectors, classifiers and distributional semantics are used to compute potentials for a Markov random field model. The model is trained discriminatively to match ground truth image-sentence pairs. Evaluation on a novel dataset shows the system can accurately annotate images and illustrate sentences, though failures still occur.
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...iyo
This document summarizes three statistical models for query translation: a co-occurrence model, a graphic model view, and a noun phrase translation model. The co-occurrence model uses words as the translation unit and selects translations based on word co-occurrence. The graphic model views translation as an undirected graph and makes independence assumptions to simplify computations. The noun phrase model translates queries at the noun phrase level using translation templates extracted from aligned text corpora.
The document describes a system for semantic textual similarity (STS) that uses various techniques to estimate the semantic similarity between texts. The system combines lexical, syntactic, and semantic information sources using state-of-the-art algorithms. In SemEval 2016 tasks, the system achieved a mean Pearson correlation of 75.7% on the monolingual English task and 86.3% on the cross-lingual Spanish-English task, ranking first in the cross-lingual task. The system utilizes techniques such as word embeddings, paragraph vectors, tree-structured LSTMs, and word alignment to capture semantic similarity.
Data Complexity in EL Family of Description LogicsAdila Krisnadhi
The document summarizes data complexity results for reasoning in extensions of the EL family of description logics. It shows that instance checking is coNP-hard, and thus data intractable, for several extensions including EL∀r.⊥, EL∀r.C, EL∃¬r.C, ELC∪D, EL∃r+.C, and EL(≥kr) for k ≥ 2. The reductions are from the NP-complete 2+2SAT problem and use partitioning or covering concepts in the TBox along with a polynomial-sized ABox to encode truth assignments. Instance checking remains tractable for the data-tractable logics ELIf and extensions of DL
The document discusses clustering documents using a multi-viewpoint similarity measure. It begins with an introduction to document clustering and common similarity measures like cosine similarity. It then proposes a new multi-viewpoint similarity measure that calculates similarity between documents based on multiple reference points, rather than just the origin. This allows a more accurate assessment of similarity. The document outlines an optimization algorithm used to cluster documents by maximizing the new similarity measure. It compares the new approach to existing document clustering methods and similarity measures.
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...Raj vardhan
The Relational Data Model and Relational Database Constraints
Ch5 (Navathe 4th edition)/ Ch7 (Navathe 3rd edition)
Example of STUDENT Relation(figure 5.1)
Shape matching and object recognition using shape context belongie pami02irisshicat
1) The document presents a novel approach for measuring shape similarity and using it for object recognition. It involves finding point correspondences between shapes, estimating an aligning transformation, and computing distance as a sum of matching errors and transformation magnitude.
2) At the core is using a "shape context" descriptor at sample points to solve the correspondence problem as a graph matching problem. This provides correspondences to estimate an aligning transformation.
3) Shape similarity is then a measure of matching errors between corresponding points after alignment, allowing nearest neighbor classification for recognition. Results are shown for various datasets.
This document describes a Synchronized Alternating Pushdown Automaton (SAPDA) that accepts the language of reduplication with a center marker (RCM). The SAPDA utilizes recursive conjunctive transitions to check that the nth letter before the center marker '$' is the same as the nth letter from the end of the string, for all letters n. This allows the SAPDA to accept strings of the form w$w, where w is any string over the alphabet {a,b}. The construction of the SAPDA involves states that check specific letters at specific positions relative to the center marker.
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόGiannis Tsakonas
Από τη δεκαετία του 1970 μέχρι σήμερα το Διεθνές Συμβούλιο Αρχείων δημιουργεί και παρέχει στην κοινότητα των Αρχείων και των Αρχειονόμων μια σειρά από πρότυπα για την ανάπτυξη αρχειακών βοηθημάτων έρευνας και συναφών καταλόγων και ευρετηρίων. Στόχος των προτύπων είναι η κοινή αντίληψη, προσέγγιση και ομοιομορφία στη δημιουργία καταλόγων, καθιερωμένων εγγραφών και στη θεματική περιγραφή των αρχείων, της δομής και του περιεχομένου τους.
Ο Παγκόσμιος ιστός έχει γίνει ένα από τα σημαντικότερα μέσα διακίνησης πληροφορίας και η εκρηκτική ανάπτυξη των αντίστοιχων τεχνολογιών ανάπτυξης εφαρμογών στο περιβάλλον του έχει οδηγήσει στην αξιοποίησή του από διάφορες κοινότητες. Σε αυτό το πλαίσιο οι Αρχειονόμοι καλούνται να κωδικοποιήσουν τα μεταδεδομένα τους και να τα καταστήσουν ικανά να διαλειτουργήσουν σε ένα παγκόσμιο περιβάλλον διαχείρισης πληροφορίας και γνώσης, όπου όλες οι επιστημονικές κοινότητες συνυπάρχουν.
Στόχος του σεμιναρίου είναι να παρουσιάσει (α) βασικά πρότυπα διαχείρισης αρχειακής πληροφορίας και (β) το τεχνολογικό υπόβαθρο το οποίο καθορίζει τους τρόπους με τους οποίους είναι δυνατή η ανταλλαγή και η διαλειτουργικότητα - διασύνδεση της αρχειακής πληροφορίας με την πληροφορία που παράγουν άλλοι οργανισμοί και κοινότητες με τις οποίες οι αρχειακές υπηρεσίες έχουν άμεση σχέση.
Το σεμινάριο απευθύνεται σε εργαζόμενους σε δημόσιους και ιδιωτικούς αρχειακούς φορείς, φοιτητές και πτυχιούχους αρχειονόμους, βιβλιοθηκονόμους και πτυχιούχους Πανεπιστημίων και ΤΕΙ με παρεμφερή επαγγελματικά και επιστημονικά ενδιαφέροντα.
Το σεμινάριο εντάσσεται στις δραστηριότητες της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας, Βιβλιοθηκονομίας και Μουσειολογίας του Ιονίου Πανεπιστημίου, διοργανώνεται στο πλαίσιο του 21st International Conference on Theory and Practice of Digital Libraries και θα διεξαχθεί στο ξενοδοχείο Grand Hotel Palace, Μοναστηρίου 305, Θεσσαλονίκη, την Τρίτη 19 Σεπτεμβρίου 2017 και ώρες 14.00 – 17.00.
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...Giannis Tsakonas
Digital libraries evaluation is characterised as an interdisciplinary and multidisciplinary domain posing a set of challenges to the research communities that intend to utilise and assess criteria, methods and tools. The amount of scientific production, which is published on the field, hinders and disorientates the researchers who are interested in the domain. The researchers need guidance in order to exploit the considerable amount of data and the diversity of methods effectively as well as to identify new research goals and develop their plans for future works. This paper proposes a methodological pathway to investigate the core topics of the digital library evaluation domain, author communities, their relationships, as well as the researchers who significantly contribute to major topics. The proposed methodology exploits topic modelling algorithms and network analysis on a corpus consisting of the digital library evaluation papers presented in JCDL,ECDL/TDPL and ICADL conferences in the period 2001–2013.
Full text at: dx.doi.org/10.1007/978-3-319-43997-6_19
Session: Digital Library Evaluation
Time: Thursday, 08/Sep/2016, 9:00am - 10:30am
Chair: Claus-Peter Klas
Location: Blauer Saal, Hannover Congress Centrum
More Related Content
Similar to Conceptual similarity: why, where and how
This document discusses various techniques for document clustering and retrieval, including cosine similarity, k-means clustering, hierarchical clustering, and the EM algorithm. Cosine similarity measures the similarity between document vectors based on the angle between them. K-means clustering partitions documents into k clusters to minimize intra-cluster similarity, while hierarchical clustering merges clusters in a dendogram based on similarity. The EM algorithm computes maximum likelihood estimates of document distributions. Evaluation of clustering assesses the quality based on intra-class and inter-class similarity.
This document discusses various techniques for document clustering and retrieval, including cosine similarity, k-means clustering, hierarchical clustering, and the EM algorithm. Cosine similarity measures the similarity between document vectors and is often used to compare documents, with higher values indicating more similar documents. K-means clustering partitions documents into k groups to minimize intra-cluster similarity, while hierarchical clustering creates a dendrogram of document clusters by progressively merging the most similar pairs. The EM algorithm computes maximum likelihood estimates for document clustering when data is incomplete. Evaluation of document clusters considers internal metrics like intra-cluster similarity and inter-cluster dissimilarity.
This document discusses various techniques for document clustering and retrieval, including cosine similarity, k-means clustering, hierarchical clustering, and the EM algorithm. Cosine similarity measures the similarity between document vectors and is often used to compare documents, with higher values indicating more similar documents. K-means clustering partitions documents into k groups to minimize intra-cluster similarity, while hierarchical clustering creates a dendrogram of document clusters by progressively merging the most similar groups. The EM algorithm computes maximum likelihood estimates for document clustering when data is incomplete. Evaluation of document clusters considers internal metrics like intra-cluster similarity and inter-cluster dissimilarity.
Semantic Web technologies are a set of languages standardized by the World Wide Web Consortium (W3C) and designed to create a web of data that can be processed by machines. One of the core languages of the Semantic Web is Web Ontology Language (OWL), a family of knowledge representation languages for authoring ontologies or knowledge bases. The newest OWL is based on Description Logics (DL), a family of logics that are decidable fragments of first-order logic. leanCoR is a new description logic reasoner designed for experimenting with the new connection method algorithms and optimization techniques for DL. leanCoR is an extension of leanCoP, a compact automated theorem prover for classical first-order logic.
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
Tutorial at the "Reality of the Semantic Gap in Image Retrieval" tutorial at the first international conference on Semantics And digital Media Technology (SAMT 2006). 6th December 2006.
This document presents a novel approach for measuring shape similarity and using it for object recognition. The key steps are:
1) Solving the correspondence problem between two shapes by attaching a descriptor called "shape context" to sample points on each shape. Shape context captures the distribution of remaining points relative to the reference point.
2) Using the point correspondences to estimate an aligning transformation between the shapes. This provides a measure of shape similarity as the matching error between corresponding points plus the magnitude of the transformation.
3) Treating recognition as a nearest neighbor problem to find the most similar stored prototype shape. The approach is demonstrated on various datasets including handwritten digits, silhouettes, and 3D objects
A survey on parallel corpora alignment andrefsantos
This document provides a survey of methods for aligning parallel text corpora. It discusses the historical background of using parallel texts in language processing from the 1950s onward. Key early methods are described, including ones based on sentence length, lexical mapping between words, and identifying cognates. The document also evaluates major efforts to create benchmark datasets and evaluate system performance against gold standard alignments. It surveys the evolution of various alignment techniques and lists some relevant tools and projects in the field.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
This document discusses methods for calculating semantic similarity between terms in ontologies. It summarizes the Wu-Palmer algorithm, which calculates similarity based on the depth of terms from their closest common ancestor. The document also describes a modified "tbk" algorithm that adds a penalization factor for terms in neighboring hierarchies to address limitations of Wu-Palmer. The paper proposes a new algorithm that calculates similarity based on the direct distance between terms rather than their distance from the root node. It argues this approach could provide better results than existing edge-based algorithms like Wu-Palmer and tbk.
The document summarizes and compares schema matching and ontology mapping. It discusses how schema matching approaches can be applied to ontology mapping given the similarities between schemas and ontologies. The document outlines different categories of schema matching techniques (element-based, structure-based) and provides examples. It also summarizes several ontology mapping tools and approaches that utilize different matching strategies like string, structure, and semantic similarity.
Information Retrieval using Semantic SimilaritySaswat Padhi
This document summarizes a seminar on artificial intelligence that covered three main topics: information retrieval using semantics and ontology, semantic similarity, and information retrieval. It discusses how semantics and ontologies can help address what information retrieval is currently lacking by providing meaning. It then covers different approaches to measuring semantic similarity based on path lengths and information content in ontologies. Finally, it discusses how information retrieval can be improved by reweighting query terms and expanding queries based on semantic similarity to related terms.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
This paper proposes a system to score how well an image matches a sentence and vice versa. It represents images and sentences as triplets of objects, actions, and scenes in a shared meaning space. Features from detectors, classifiers and distributional semantics are used to compute potentials for a Markov random field model. The model is trained discriminatively to match ground truth image-sentence pairs. Evaluation on a novel dataset shows the system can accurately annotate images and illustrate sentences, though failures still occur.
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...iyo
This document summarizes three statistical models for query translation: a co-occurrence model, a graphic model view, and a noun phrase translation model. The co-occurrence model uses words as the translation unit and selects translations based on word co-occurrence. The graphic model views translation as an undirected graph and makes independence assumptions to simplify computations. The noun phrase model translates queries at the noun phrase level using translation templates extracted from aligned text corpora.
The document describes a system for semantic textual similarity (STS) that uses various techniques to estimate the semantic similarity between texts. The system combines lexical, syntactic, and semantic information sources using state-of-the-art algorithms. In SemEval 2016 tasks, the system achieved a mean Pearson correlation of 75.7% on the monolingual English task and 86.3% on the cross-lingual Spanish-English task, ranking first in the cross-lingual task. The system utilizes techniques such as word embeddings, paragraph vectors, tree-structured LSTMs, and word alignment to capture semantic similarity.
Data Complexity in EL Family of Description LogicsAdila Krisnadhi
The document summarizes data complexity results for reasoning in extensions of the EL family of description logics. It shows that instance checking is coNP-hard, and thus data intractable, for several extensions including EL∀r.⊥, EL∀r.C, EL∃¬r.C, ELC∪D, EL∃r+.C, and EL(≥kr) for k ≥ 2. The reductions are from the NP-complete 2+2SAT problem and use partitioning or covering concepts in the TBox along with a polynomial-sized ABox to encode truth assignments. Instance checking remains tractable for the data-tractable logics ELIf and extensions of DL
The document discusses clustering documents using a multi-viewpoint similarity measure. It begins with an introduction to document clustering and common similarity measures like cosine similarity. It then proposes a new multi-viewpoint similarity measure that calculates similarity between documents based on multiple reference points, rather than just the origin. This allows a more accurate assessment of similarity. The document outlines an optimization algorithm used to cluster documents by maximizing the new similarity measure. It compares the new approach to existing document clustering methods and similarity measures.
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...Raj vardhan
The Relational Data Model and Relational Database Constraints
Ch5 (Navathe 4th edition)/ Ch7 (Navathe 3rd edition)
Example of STUDENT Relation(figure 5.1)
Shape matching and object recognition using shape context belongie pami02irisshicat
1) The document presents a novel approach for measuring shape similarity and using it for object recognition. It involves finding point correspondences between shapes, estimating an aligning transformation, and computing distance as a sum of matching errors and transformation magnitude.
2) At the core is using a "shape context" descriptor at sample points to solve the correspondence problem as a graph matching problem. This provides correspondences to estimate an aligning transformation.
3) Shape similarity is then a measure of matching errors between corresponding points after alignment, allowing nearest neighbor classification for recognition. Results are shown for various datasets.
This document describes a Synchronized Alternating Pushdown Automaton (SAPDA) that accepts the language of reduplication with a center marker (RCM). The SAPDA utilizes recursive conjunctive transitions to check that the nth letter before the center marker '$' is the same as the nth letter from the end of the string, for all letters n. This allows the SAPDA to accept strings of the form w$w, where w is any string over the alphabet {a,b}. The construction of the SAPDA involves states that check specific letters at specific positions relative to the center marker.
Similar to Conceptual similarity: why, where and how (20)
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόGiannis Tsakonas
Από τη δεκαετία του 1970 μέχρι σήμερα το Διεθνές Συμβούλιο Αρχείων δημιουργεί και παρέχει στην κοινότητα των Αρχείων και των Αρχειονόμων μια σειρά από πρότυπα για την ανάπτυξη αρχειακών βοηθημάτων έρευνας και συναφών καταλόγων και ευρετηρίων. Στόχος των προτύπων είναι η κοινή αντίληψη, προσέγγιση και ομοιομορφία στη δημιουργία καταλόγων, καθιερωμένων εγγραφών και στη θεματική περιγραφή των αρχείων, της δομής και του περιεχομένου τους.
Ο Παγκόσμιος ιστός έχει γίνει ένα από τα σημαντικότερα μέσα διακίνησης πληροφορίας και η εκρηκτική ανάπτυξη των αντίστοιχων τεχνολογιών ανάπτυξης εφαρμογών στο περιβάλλον του έχει οδηγήσει στην αξιοποίησή του από διάφορες κοινότητες. Σε αυτό το πλαίσιο οι Αρχειονόμοι καλούνται να κωδικοποιήσουν τα μεταδεδομένα τους και να τα καταστήσουν ικανά να διαλειτουργήσουν σε ένα παγκόσμιο περιβάλλον διαχείρισης πληροφορίας και γνώσης, όπου όλες οι επιστημονικές κοινότητες συνυπάρχουν.
Στόχος του σεμιναρίου είναι να παρουσιάσει (α) βασικά πρότυπα διαχείρισης αρχειακής πληροφορίας και (β) το τεχνολογικό υπόβαθρο το οποίο καθορίζει τους τρόπους με τους οποίους είναι δυνατή η ανταλλαγή και η διαλειτουργικότητα - διασύνδεση της αρχειακής πληροφορίας με την πληροφορία που παράγουν άλλοι οργανισμοί και κοινότητες με τις οποίες οι αρχειακές υπηρεσίες έχουν άμεση σχέση.
Το σεμινάριο απευθύνεται σε εργαζόμενους σε δημόσιους και ιδιωτικούς αρχειακούς φορείς, φοιτητές και πτυχιούχους αρχειονόμους, βιβλιοθηκονόμους και πτυχιούχους Πανεπιστημίων και ΤΕΙ με παρεμφερή επαγγελματικά και επιστημονικά ενδιαφέροντα.
Το σεμινάριο εντάσσεται στις δραστηριότητες της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας, Βιβλιοθηκονομίας και Μουσειολογίας του Ιονίου Πανεπιστημίου, διοργανώνεται στο πλαίσιο του 21st International Conference on Theory and Practice of Digital Libraries και θα διεξαχθεί στο ξενοδοχείο Grand Hotel Palace, Μοναστηρίου 305, Θεσσαλονίκη, την Τρίτη 19 Σεπτεμβρίου 2017 και ώρες 14.00 – 17.00.
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...Giannis Tsakonas
Digital libraries evaluation is characterised as an interdisciplinary and multidisciplinary domain posing a set of challenges to the research communities that intend to utilise and assess criteria, methods and tools. The amount of scientific production, which is published on the field, hinders and disorientates the researchers who are interested in the domain. The researchers need guidance in order to exploit the considerable amount of data and the diversity of methods effectively as well as to identify new research goals and develop their plans for future works. This paper proposes a methodological pathway to investigate the core topics of the digital library evaluation domain, author communities, their relationships, as well as the researchers who significantly contribute to major topics. The proposed methodology exploits topic modelling algorithms and network analysis on a corpus consisting of the digital library evaluation papers presented in JCDL,ECDL/TDPL and ICADL conferences in the period 2001–2013.
Full text at: dx.doi.org/10.1007/978-3-319-43997-6_19
Session: Digital Library Evaluation
Time: Thursday, 08/Sep/2016, 9:00am - 10:30am
Chair: Claus-Peter Klas
Location: Blauer Saal, Hannover Congress Centrum
Increasing traceability of physical library items through Koha: the case of S...Giannis Tsakonas
Presentation in KohaCon2016, the major event of Koha community, on May 31, 2016. The Library & Information Center, University of Patras, Greece has developed the SELIDA framework, which integrates a set of standardized and widespread library technologies in order to increase the identification and traceability of physical items, such as books. The framework makes use of RFID tags in order to assign unique identification marks, in the form of URIs that can be globally exchanged. The framework has been implemented in the fully translated and customized Koha installation of our Library and its core services support checking in/out of books and browsing of history transactions with geospatial visualization. Its use can support transactions between various libraries or branches of the same library. The proposed presentation will describe the architecture of the framework and how it connects to Koha, as well as the challenges we faced during its development.
We were group no 2: notes for the MLAS2015 workshopGiannis Tsakonas
Summary note of the discussion of group no 2 in the IFLA MLAS 2015 workshop in Athens, March 12, 2015, involving librarians (from all around the world), book palaces and … Canadian rock groups.
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταGiannis Tsakonas
Παρουσίαση στο πάνελ "Πολιτισµός και νέες τεχνολογίες: από τον αισθητό στον ψηφιακό κόσµο" που διοργανώθηκε στο πλαίσιο της ενότητας "Ελευθέρο Βήμα" του Forum Ανάπτυξης 2014 (Κυριακή 23 Νοεµβρίου 2014 στις 20:30-22:00, Ξενοδοχείο Αστήρ, Αίθουσα ΙΙ).
{Tech}changes: the technological state of Greek Libraries.Giannis Tsakonas
The document summarizes technological changes in Greek libraries over recent years. While Greek libraries were early adopters of technological changes, penetration of eBooks and sophisticated business models remains limited. However, libraries have increasingly embraced open access, open source, and open data initiatives. Projects like Kallipos provide enhanced academic textbooks online. Funding from the EU and Greece has supported centralized technological solutions and opportunities for public/private cooperation to make technology more affordable and transform literacy programs.
Affective relationships between users & libraries in times of economic stressGiannis Tsakonas
This study used the Stimulus-Organism-Response framework to identify the critical parameters that govern the affective relationships between Greek academic libraries and their users during times of economic stress. A survey of 950 library users found that social cues like willingness, kindness, and knowledge had the strongest impact on users' emotions. Emotions like satisfaction, confidence and safety positively correlated with library usage. The findings suggest that creating a welcoming environment and providing friendly service are important for positively influencing users' feelings about the library. Further research is needed to explore how these relationship factors interact and influence social and systemic interactions.
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Giannis Tsakonas
This document proposes a methodology for discovering patterns in scientific literature using a case study of digital library evaluation. It involves:
1. Classifying documents to identify relevant papers using naive Bayes classification.
2. Semantically annotating papers with concepts from a Digital Library Evaluation Ontology using the GoNTogle annotation tool. Over 2,600 annotations were generated.
3. Clustering the annotated papers into coherent groups using k-means clustering.
4. Interpreting the clusters with the assistance of the ontology to discover patterns and trends in the literature. Benchmarking tests were performed to evaluate effectiveness of the methodology.
Παρουσίαση για το σεμινάριο “Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data”
Το σεμινάριο διοργανώθηκε από τις Βιβλιοθήκες του Τμήματος Νομικής του ΕΚΠΑ και του Πανεπιστημίου Πειραιώς στις 18 και 19 Ιουνίου 2012, υπό την επιμέλεια της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας - Βιβλιοθηκονομίας του Ιονίου Πανεπιστημίου
Διδάσκοντες:
- Μανόλης Πεπονάκης (MLIS)
- Δρ. Μιχάλης Σφακάκης
- Δρ. Χρήστος Παπαθεοδώρου
Policies for geospatial collections: a research in US and Canadian academic l...Giannis Tsakonas
This document summarizes a research study on geospatial collection development policies in US and Canadian academic libraries. It includes the session overview, research framework, definitions, literature review, objectives, methodology, findings, and conclusions. The methodology involved analyzing the websites of 21 academic libraries for their geospatial policies. The findings show variability in policies but many included general information, collection details, and references to open data. The conclusions are that policies lack homogeneity and more research is needed on policies in other countries.
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Giannis Tsakonas
This document discusses developing a metadata model called ARMOS (Architecture Metadata Object Schema) for describing historic buildings. It covers traditional flat metadata descriptions of architecture, examining relationships between works and images/other works. ARMOS aims to group related buildings logically and connect them to facilitate discovery. It draws from architecture theories on morphology, typology and patterns. The conceptual model identifies entities, relationships and attributes. ARMOS is a harmonization profile combining descriptive, structural, administrative and technical metadata from various sources. Issues around terminology, extensions and interoperability are discussed.
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Giannis Tsakonas
This document summarizes a workshop on digital information management that took place in April 2012 in Corfu, Greece. It discusses some of the challenges of information retrieval when using natural language queries, including problems of ambiguity, context, and the use of knowledge organization systems and query expansion to help address these challenges. The role of user models and evaluation in understanding real language use is also mentioned.
The document summarizes a path-based approach for storing and querying multidimensional XML (MXML) data in a relational database. MXML extends XML to represent data with different facets under different contexts. The approach stores MXML nodes in separate tables based on their type and uses a path table and Dewey labeling for indexing. It represents contexts using ordered worlds and binary vectors. It also defines Multidimensional XPath (MXPath) to query MXML data using both explicit and inherited contexts.
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataGiannis Tsakonas
Παρουσίαση για το σεμινάριο “Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data”
Το σεμινάριο διοργανώθηκε από τη Βιβλιοθήκη και Κέντρο Πληροφόρησης του Πανεπιστημίου Πατρών, στους χώρους της οποίας διεξήχθη την Παρασκευή 3 Φεβρουαρίου 2012, υπό την επιμέλεια της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας - Βιβλιοθηκονομίας του Ιονίου Πανεπιστημίου
Διδάσκοντες:
- Μανόλης Πεπονάκης (MLIS)
- Δρ. Μιχάλης Σφακάκης
- Δρ. Χρήστος Παπαθεοδώρου
This document discusses open bibliographic data and the Open Bibliographic Principles initiative. It provides background on exchanging bibliographic data and reasons for making it open, such as freeing access, facilitating collaboration, and advancing research. Complications discussed include proprietary attitudes and loss of provenance over time. The document also covers topics such as using bibliographic data as linked open data, navigating it, examples like Libris, applicable licenses, and the E-LIS experience in adopting an open license.
Evaluation is a very vital research interest in the digital library domain. This has been exhibited by the growth of the literature in the main conferences and journal papers. However it is very difficult for one to navigate in this extended corpus. For these reasons the DiLEO ontology has been developed in order to assist the exploration of important concepts and the discovery of trends in the evaluation of digital libraries. DiLEO is a domain ontology, which aims to conceptualize the DL evaluation domain by correlating its key entities and provide reasoning paths that support the design of evaluation experiments.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
How to Download & Install Module From the Odoo App Store in Odoo 17Celine George
Custom modules offer the flexibility to extend Odoo's capabilities, address unique requirements, and optimize workflows to align seamlessly with your organization's processes. By leveraging custom modules, businesses can unlock greater efficiency, productivity, and innovation, empowering them to stay competitive in today's dynamic market landscape. In this tutorial, we'll guide you step by step on how to easily download and install modules from the Odoo App Store.
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
Information and Communication Technology in EducationMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 2)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐈𝐂𝐓 𝐢𝐧 𝐞𝐝𝐮𝐜𝐚𝐭𝐢𝐨𝐧:
Students will be able to explain the role and impact of Information and Communication Technology (ICT) in education. They will understand how ICT tools, such as computers, the internet, and educational software, enhance learning and teaching processes. By exploring various ICT applications, students will recognize how these technologies facilitate access to information, improve communication, support collaboration, and enable personalized learning experiences.
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐬𝐨𝐮𝐫𝐜𝐞𝐬 𝐨𝐧 𝐭𝐡𝐞 𝐢𝐧𝐭𝐞𝐫𝐧𝐞𝐭:
-Students will be able to discuss what constitutes reliable sources on the internet. They will learn to identify key characteristics of trustworthy information, such as credibility, accuracy, and authority. By examining different types of online sources, students will develop skills to evaluate the reliability of websites and content, ensuring they can distinguish between reputable information and misinformation.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Conceptual similarity: why, where and how
1. Conceptual Similarity:
Why, Where, How
Michalis Sfakakis
Laboratory on Digital Libraries & Electronic Publishing,
Department of Archives and Library Sciences,
Ionian University, Greece
First Workshop on Digital Information Management, March 30-31, 2011
Corfu, Greece
2. Do we need similarity?
Are the following objects similar?
(Similarity, SIMILARITY)
As character sequences, NO!
How do they differ?
As character sequences, but case
insensitive, Yes!
As English words, Yes!
Same word! They have the same definition,
written differently
4. What about the similarity of the objects?
(1, a)
The first object is the number one and the second is
the first letter of the English alphabet. Therefore, as
the first is a number and the second is a letter, they
are different!
e.g. a chapter, or a paragraph number, they are both
representing the first object of the list, the first
chapter, paragraph, etc. Therefore, they could be
considered as being similar!
5. Results for an Information Need
How similar are the Results? Which one to select?
6. Comparing Concepts
objects?
(Disease, Illness)
As English words, or as character
sequences they are not similar!
How do they differ?
As synonymous terms in a Thesaurus, they
are both representing the same concept.
(related with the equivalency relationship)
7. Comparing Hierarchies
*
car from the left hierarchy to the
node auto from the right hierarchy?
van from both hierarchies?
* [Dellschaft and Staab, 2006]
8. Similarity is a context dependent concept
Merriam-
defines similarity as*:
A quality that makes one person or thing like
another
Therefore, the context and the
characteristics in common are required in
order to specify and measure similarity
* http://www.learnersdictionary.com/search/similarity
9. Where the concept of similarity is
encountered
Machine learning
Ontology Learning
Schema & Ontology Matching and Mapping
Clustering
IR
pattern recognition algorithm
Vital part of the Semantic Web development
10. Precision & Recall in IR, measuring
similarity between answers
Let C be the result set for a query (the retrieved
documents, i.e. the Computed set)
Also, we need to know the correct results for the
query (all the relevant documents, the Reference
set)
Precision: is the fraction of retrieved documents that
are relevant to the search
Recall: is the fraction of the documents that are
relevant to the query that are successfully retrieved
Wikipedia: http://en.wikipedia.org/wiki/Precision_and_recall
11. measure similarity
Precision & Recall are two widely used
metrics for evaluating the correctness of
a pattern recognition algorithm
Recall and Precision depend on the
outcome (oval) of a pattern recognition
algorithm and its relation to all relevant
patterns (left) and the non-relevant
patterns (right).
The more correct results (green), the
better.
Precision: horizontal arrow.
Recall: diagonal arrow.
Wikipedia: http://en.wikipedia.org/wiki/Precision_and_recall
12. Precision & Recall, once more
D
Precision
P = |R C|/|R| False Negative False Positive
Recall
R = |R C|/|C| R R C C
TP = R C
TN = D (R C)
True Positive
FN = R C
FP = C R True Negative
13. Overall evaluation,
combining Precision & Recall
Given Precision & Recall, F-measure could combines
them for an overall evaluation
Balanced F-measure (P & R are evenly weighted)
F1 = 2*(P*R)/(P+R)
Weighted F-measure
Fb = (1+b2)*(P*R)/(b2*P+R), b non-zero
F1 (b=2) weights recall twice as much as precision
F0.5 (b=0.5) weights precision twice as much as recall
14. Measuring Similarity,
Comparing two Ontologies
A simplified definition of a core ontology*:
The structure O := (C, root, C) is called a core ontology. C
is a set of concept identifiers and root is a designated root
concept for the partial order C on C. This partial order is
called concept hierarchy or taxonomy. The equation
c C : c C root holds for this concept hierarchy.
Levels of comparison
Lexical, how terms are used to convey meanings
Conceptual, which conceptual relations exist between terms
* [Dellschaft and Staab, 2006]
15. Gold Standard based
Evaluation of Ontology Learning
OR OC
Given a pre-defined ontology
The so-called Gold Standard or Reference
Compare the Learned (Computed) Ontology
with the Gold Standard
16. Measuring Similarity -
Lexical Comparison Level LP, LR
OR OC
Lexical Precision & Lexical Recall
LP(OC, OR) = |CC CR|/|CC|
LR(OC, OR) = |CC CR|/|CR|
The lexical precision and recall reflect how good the
learned lexical terms CC cover the target domain CR
For the above example LP=4/6=0.67, LR=4/5=0.8
17. Measuring Similarity,
Lexical Comparison Level - aSM
Average String Matching, using edit distance
Levenshtein distance, the most common definition
for edit distance, measures the minimum number
of token insertions, deletions and substitutions
required to transform one string into an other
For example*, the Levenshtein distance
between "kitten" and "sitting" is 3 (there is
no way to do it with fewer than three edits)
kitten sitten (substitution of 's' for 'k')
sitten sittin (substitution of 'i' for 'e')
sittin sitting (insertion of 'g' at the end).
* Wikipedia: http://en.wikipedia.org/wiki/Levenshtein_distance
18. Measuring Similarity,
Lexical Comparison Level String Matching
String Matching measure
(SM), given two lexical
entries L1, L2
Weights the number of the
required changes against
the shorter string
1 stands for perfect match,
0 for bad match
Average SM
Asymmetric, determines the
extend to which L1 (target)
is covered by L2 (source)
[Maedche and Staab, 2002]
19. Measuring Similarity,
Lexical Comparison Level - RelHit
RelHit actually express Lexical Precision
RelHit Compared to average String
Matching
Average SM reduces the influences of string
pseudo-differences (e.g. singular vs. plurals)
Average SM may introduce some kind of noise,
20. Measuring Similarity,
Conceptual Comparison Level
Conceptual level compares semantic
structure of ontologies
Conceptual structures are constituted by
Hierarchies, or by Relations
How to compare two hierarchies?
How do the positions of concepts influence
similarity of Hierarchies?
What measures to use?
21. Measuring Similarity,
Conceptual Comparison Level
OR OC
Local measures compare the positions of two concepts
based on characteristics extracts from the concept
hierarchies they belong to
Some characteristic extracts
Semantic Cotopy (sc)
sc(c, O) = {ci|ci C (ci c c ci)}
Common Semantic Cotopy (csc)
csc(c, O1, O2) = {ci|ci C1 C2 (ci c c ci)}
22. Measuring Similarity,
Conceptual Comparison Level sc
OR OC
Semantic Cotopy
sc(c, O) = {ci|ci C (ci c c ci)}
Semantic Cotopy examples
OR) = {root, bike, car, van, coup}
OC) = {root, bike, auto, BMX, van, coup }
OR) = {root, bike}
OC) = {root , bike, BMX}
OR) = {root , car, van, coup }
OC) = {root, auto, van, coup }
23. Measuring Similarity,
Conceptual Comparison Level csc
OR OC
Common Semantic Cotopy
csc(c, O1, O2) = {ci|ci C1 C2 (ci c c ci)}
Common Semantic Cotopy examples
C1 C2 = {root, bike, van, coup }
OR, OC) = {bike, van, coup }
OC, OR) = {bike, van, coup }
OR, OC OC, OR) = {root}
OR, OC) = {root , van, coup OC, OR) =
Oc, OR) = {root, van, coup OC, OR) =
24. Measuring Similarity, Conceptual
Comparison Level local measures tp, tr
OR OC
Local taxonomic precision using characteristic extracts
tpce(c1, c2, OC, OR) = |ce(c1, OC) ce(c1, OR) |/|ce(c1, OC)|
Local taxonomic recall using characteristic extracts
trce(c1, c2, OC, OR) = |ce(c1, OC) ce(c1, OR) |/|ce(c1, OR)|
25. Measuring Similarity, Conceptual
Comparison Level local measures tp
OR OC
Local taxonomic precision examples using sc
OR) = {root, bike},
OC) = {root, bike, BMX}
tpsc C, OR) = |{root, bike}|/|{root, bike, BMX}|,
tpsc C, OR) = 2/3 = 0.67
[Maedche and Staab, 2002]
26. Measuring Similarity, Conceptual
Comparison Level local measures tp
OR OC
Local taxonomic precision examples using sc
OR) = {root , car, van, coup },
OC) = {root , auto, van, coup }
tpsc car C, OR) =
|{ } |/| |,
tpsc car C, OR) = 3/4 = 0.75
28. Measuring Similarity, Conceptual
Comparison Level Overall evaluation
F-measure, but now using Global Taxonomic
Precision (TP) and Global Taxonomic Recall (TR)
Balanced Taxonomic F-measure (TP & TR are evenly
weighted)
TF1 = 2*(TP*TR)/(TP+TR)
Weighted TF-measure
TFb = (1+b2)*(TP*TR)/(b2*TP+TR), b non-zero
TF1 (b=2) weights recall twice as much as precision
TF0.5 (b=0.5) weights precision twice as much as
recall
30. References & Further Reading
Dellschaft, Klaas and Staab, Steffen ( 2006) : On How to
Perform a Gold Standard Based Evaluation of Ontology
Learning. In: I. Cruz et al. (Eds.) ISWC 2006. LNCS 4273, pp.
228 241. Springer, Heidelberg
Maedche, A., Staab, S. (2002): Measuring similarity between
ontologies. In: Proceedings of the European Conference on
Knowledge Acquisition and Management (EKAW-2002).
Siguenza, Spain
Staab, S. and Hotho, A.: Semantic Web and Machine Learning
Tutorial (available at http://www.uni-
koblenz.de/~staab/Research/Events/ICML05tutorial/icml05tuto
rial.pdf)
Bellahsene, Z., Bonifati, A., Rahm, E. (Eds.) (2011): Schema
Matching and Mapping, Heidelberg, Springer- Verlag, ISBN
978-3-642-16517-7.