The document describes a framework for optimizing named entity disambiguation by selecting and exploiting relevant semantic data based on the disambiguation scenario. It proposes constructing evidence models specifying how semantic entities can provide disambiguation evidence for target entities in different scenarios. An evaluation shows the framework achieves better disambiguation than generic approaches in two scenarios: football match descriptions and military conflict texts. Future work includes fully automating evidence model construction and combining ontological and statistical methods.
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
Word Sense Disambiguation (WSD) is an important area which has an impact on improving the performance of applications of computational linguistics such as machine translation, information
retrieval, text summarization, question answering systems, etc. We have presented a brief history of WSD,
discussed the Supervised, Unsupervised, and Knowledge-based approaches for WSD. Though many WSD
algorithms exist, we have considered optimal and portable WSD algorithms as most appropriate since they
can be embedded easily in applications of computational linguistics. This paper will also provide an idea of
some of the WSD algorithms and their performances, which compares and assess the need of the word
sense disambiguation.
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEkevig
A value-based approach to Natural Language Understanding, in particular, the disambiguation of
pronouns, is illustrated with a solution to a typical example from the Winograd Schema Challenge. The
worked example uses a language engine, Enguage, to support the articulation of the advocation and
fearing of violence. The example illustrates the indexical nature of pronouns, and how their values, their
referent objects, change because they are set by contextual data. It must be noted that Enguage is not a
suitable candidate for addressing the Winograd Schema Challenge as it is an interactive tool, whereas
the Challenge requires a preconfigured, unattended program.
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
Word Sense Disambiguation (WSD) is an important area which has an impact on improving the performance of applications of computational linguistics such as machine translation, information
retrieval, text summarization, question answering systems, etc. We have presented a brief history of WSD,
discussed the Supervised, Unsupervised, and Knowledge-based approaches for WSD. Though many WSD
algorithms exist, we have considered optimal and portable WSD algorithms as most appropriate since they
can be embedded easily in applications of computational linguistics. This paper will also provide an idea of
some of the WSD algorithms and their performances, which compares and assess the need of the word
sense disambiguation.
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEkevig
A value-based approach to Natural Language Understanding, in particular, the disambiguation of
pronouns, is illustrated with a solution to a typical example from the Winograd Schema Challenge. The
worked example uses a language engine, Enguage, to support the articulation of the advocation and
fearing of violence. The example illustrates the indexical nature of pronouns, and how their values, their
referent objects, change because they are set by contextual data. It must be noted that Enguage is not a
suitable candidate for addressing the Winograd Schema Challenge as it is an interactive tool, whereas
the Challenge requires a preconfigured, unattended program.
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...Panos Alexopoulos
The advent and wide proliferation of Social Web in the re-
cent years has promoted the concept of social interaction as an important influencing factor of the way enterprises and organizations conduct business. Among the fields influenced is that of Enterprise Knowledge Management, where adoption of social computing approaches aims at increasing and maintaining at high levels the active participation of users in the organization's knowledge management activities. An important challenge towards this is the achievement of the right balance between informalities of socially generated data and the required formality of enterprise knowledge. In this context, we focus on the problem of mining vague knowledge from social content generated within an enterprise framework and we propose a learning framework based on microblogging and fuzzy ontologies.
Data Models or Conceptual Schema are in fact small language definitions as they constrain the kinds of facts that can be stored/expressed in their resulting database. This is the cause of the huge data integration problems in information management. Instead of that data analysts should use a universal language and should build databases that allow for the expression of any fact in that language.
Domain Driven Design main concepts
This presentation is a summary of the book "Domain Driven Design" from InfoQ.
Here is the link: http://www.infoq.com/minibooks/domain-driven-design-quickly
A Survey of Object Oriented Programming LanguagesMaya Hris.docxdaniahendric
A Survey of Object Oriented Programming Languages
Maya Hristakeva, RadhaKrishna Vuppala
Univ. of California, Santa Cruz
{mayah,vrk}@soe.ucsc.edu
1 Abstract
Object-oriented programming has become a very important programming paradigm of our times.
From the time it was brought into existence by Simula, object-oriented programming has seen wide
acceptance. Object-oriented programming languages (OOPLs) directly support the object notions of
classes, inheritance, information hiding (encapsulation), and dynamic binding (polymorphism). There
is a wide variety of implementations for each of these concepts, and there is no general agreement as to
how a particular concept must be interpreted. This survey takes a detailed look at some of the concepts
considered fundamental to object-orientation, namely inheritance and polymorphism. Different aspects
of inheritance and polymorphism are implemented in various popular OOPLs. We conclude with the
observation that there is still lot of work to be done to reach a common ground for these crucial features
of OOPLs. This survey presents a detailed comparison of Java, C++, C# , Eiffel, Smalltalk, Ruby and
Python in terms of their inheritance and polymorphism implementations. The paper also presents a
compilation of the observations made by several earlier surveys [1, 27].
2 Introduction
There is a big variety of programming languages catering to various kinds of development require-
ments. Three of the main categories are procedural languages (e.g. C, Pascal, etc.), functional languages
(e.g. Haskel, Ocaml, etc.), and object-oriented programming languages (e.g. C++, Java, etc.). The
object-oriented design paradigm has been popular for quite some time owing its success to the powerful
features it offers for making program development easy and robust. OOPLs, such as C++ and Java,
offer an intuitive way of developing programs and provide powerful features for supporting the program
development. While languages like C can be used to develop programs that follow an object-oriented
design, the support of features such as inheritance, encapsulation, strong type support, exception han-
dling, etc. in the OOPLs make them more suitable for such development.
While the object-oriented programming paradigm provides a more intuitive way of programming, it
is also has complexities. This is due to the various complex features that the paradigm offers. OOPLs
differ widely in the way they implement features that are associated with the object design. For example,
some languages support multiple inheritance while some languages consider it a bad feature. In this sur-
vey we discuss the various features of object-oriented programs and how the languages we considered
1
(Java, C++, C# , Eiffel, Smalltalk, Ruby and Python) differ in implementing these features.
The survey is organized as follows. The Section 3 describes in detail the key concepts of OOPLs. Sec-
tion 4 presents a brief historical view of the OOPLs and gives a ...
Conceptual Interoperability and Biomedical DataJim McCusker
The goals of conceptual interoperability are:
Make similar but distinct data resources available for search, conversion, and inter-mapping in a way that mirrors human understanding of the data being searched.
Make data resources that use cross-cutting models (HL7-RIM, provenance models, etc.) interoperable with domain-specific models without explicit mappings between them.
The emergence in recent years of initiatives like the Linked Open Data (LOD) has led to a significant increase in the amount of structured semantic data on the Web. In this paper we argue that the shareability and wider reuse of such data can very often be hampered by the existence of vagueness within it, as this makes the data’s meaning less explicit. Moreover, as a way to reduce this problem,
we propose a vagueness metaontology that may represent in an explicit way the nature and characteristics of vague elements within semantic data.
Microposts Ontology Construction Via Concept Extraction dannyijwest
The social networking website Facebook offers to its users a feature called “status updates” (or just “status”), which allows users to create Microposts directed to all their contacts, or a subset thereof. Readers can respond to Microposts, or in addition to that also click a “Like” button to show their appreciation for a certain Micropost. Adding semantic meaning in the sense of unambiguous intended ideas to such Microposts. We can make a start towards semantic web by adding semantic annotation to web resources. Ontology are used to specify meaning of annotations. Ontology provide a vocabulary for representing and communicating knowledge about some topic and a set of semantic relationships that hold among the terms in that vocabulary. For increasing the efficiency of ontology based application there is a need to develop a mechanism that reduces the manual work in developing ontology. In this paper, we proposed Microposts’ ontology construction. In this paper we present a method that extracts meaningful knowledge from microposts shared in social platforms. This process involves different steps for the analysis of such microposts (extraction of keywords, named entities and their matching to ontological concepts).
Microposts Ontology Construction Via Concept Extraction dannyijwest
The social networking website Facebook offers to its users a feature called “status updates” (or just
“status”), which allows users to create Microposts directed to all their contacts, or a subset thereof.
Readers can respond to Microposts, or in addition to that also click a “Like” button to show their
appreciation for a certain Micropost. Adding semantic meaning in the sense of unambiguous intended ideas
to such Microposts. We can make a start towards semantic web by adding semantic annotation to web
resources. Ontology are used to specify meaning of annotations. Ontology provide a vocabulary for
representing and communicating knowledge about some topic and a set of semantic relationships that hold
among the terms in that vocabulary. For increasing the efficiency of ontology based application there is a
need to develop a mechanism that reduces the manual work in developing ontology. In this paper, we
proposed Microposts’ ontology construction. In this paper we present a method that extracts meaningful
knowledge from microposts shared in social platforms. This process involves different steps for the analysis
of such microposts (extraction of keywords, named entities and their matching to ontological concepts).
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
Although there is an emerging trend towards generating embeddings for primarily unstructured data and, recently, for structured data, no systematic suite for measuring the quality of embeddings has been proposed yet.
This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of the encoded structure as well as semantic patterns in the embedding space.
In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect.
Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings.
Furthermore, w.r.t. this framework, multiple experimental studies were run to compare the quality of the available embedding models.
Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts.
We positioned our sampled data and code at https://github.com/alshargi/Concept2vec under GNU General Public License v3.0.
Semantic Modeling for Information FederationCory Casanave
Semantic Modeling for Information Federation describes the UML profile and methodology for conceptual modeling and using conceptual reference models for federation and integration of information, systems and organizations.
This presentation contains both an introduction and detail appropriate for experienced architects.
In context-aware trust evaluation, using ontology tree is a popular approach to represent the relation
between contexts. Usually, similarity between two contexts is computed using these trees. Therefore, the
performance of trust evaluation highly depends on the quality of ontology trees. Fairness or granularity
consistency is one of the major limitations affecting the quality of ontology tree. This limitation refers to
inequality of semantic similarity in the most ontology trees. In other words, semantic similarity of every two
adjacent nodes is unequal in these trees. It deteriorates the performance of contexts similarity computation.
We overcome this limitation by weighting tree edges based on their semantic similarity. Weight of each
edge is computed using Normalized Similarity Score (NSS) method. This method is based on frequencies of
concepts (words) co-occurrences in the pages indexed by search engines. Our experiments represent the
better performance of the proposed approach in comparison with established trust evaluation approaches.
The suggested approach can enhance efficiency of any solution which models semantic relations by
ontology tree.
Trust Evaluation Using an Improved Context Similarity Measurementijbiss
In context-aware trust evaluation, using ontology tree is a popular approach to represent the relation between contexts. Usually, similarity between two contexts is computed using these trees. Therefore, the performance of trust evaluation highly depends on the quality of ontology trees. Fairness or granularity consistency is one of the major limitations affecting the quality of ontology tree. This limitation refers to inequality of semantic similarity in the most ontology trees. In other words, semantic similarity of every two adjacent nodes is unequal in these trees. It deteriorates the performance of contexts similarity computation. We overcome this limitation by weighting tree edges based on their semantic similarity. Weight of each
edge is computed using Normalized Similarity Score (NSS) method. This method is based on frequencies of concepts (words) co-occurrences in the pages indexed by search engines. Our experiments represent the better performance of the proposed approach in comparison with established trust evaluation approaches. The suggested approach can enhance efficiency of any solution which models semantic relations by
ontology tree.
study or concern about what kinds of things exist
what entities there are in the universe.
the ontology derives from the Greek onto (being) and logia (written or spoken). It is a branch of metaphysics , the study of first principles or the root of things.
A set of practical strategies and techniques for tackling vagueness in data modeling and creating models that are semantically more accurate and interoperable.
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...Panos Alexopoulos
The advent and wide proliferation of Social Web in the re-
cent years has promoted the concept of social interaction as an important influencing factor of the way enterprises and organizations conduct business. Among the fields influenced is that of Enterprise Knowledge Management, where adoption of social computing approaches aims at increasing and maintaining at high levels the active participation of users in the organization's knowledge management activities. An important challenge towards this is the achievement of the right balance between informalities of socially generated data and the required formality of enterprise knowledge. In this context, we focus on the problem of mining vague knowledge from social content generated within an enterprise framework and we propose a learning framework based on microblogging and fuzzy ontologies.
Data Models or Conceptual Schema are in fact small language definitions as they constrain the kinds of facts that can be stored/expressed in their resulting database. This is the cause of the huge data integration problems in information management. Instead of that data analysts should use a universal language and should build databases that allow for the expression of any fact in that language.
Domain Driven Design main concepts
This presentation is a summary of the book "Domain Driven Design" from InfoQ.
Here is the link: http://www.infoq.com/minibooks/domain-driven-design-quickly
A Survey of Object Oriented Programming LanguagesMaya Hris.docxdaniahendric
A Survey of Object Oriented Programming Languages
Maya Hristakeva, RadhaKrishna Vuppala
Univ. of California, Santa Cruz
{mayah,vrk}@soe.ucsc.edu
1 Abstract
Object-oriented programming has become a very important programming paradigm of our times.
From the time it was brought into existence by Simula, object-oriented programming has seen wide
acceptance. Object-oriented programming languages (OOPLs) directly support the object notions of
classes, inheritance, information hiding (encapsulation), and dynamic binding (polymorphism). There
is a wide variety of implementations for each of these concepts, and there is no general agreement as to
how a particular concept must be interpreted. This survey takes a detailed look at some of the concepts
considered fundamental to object-orientation, namely inheritance and polymorphism. Different aspects
of inheritance and polymorphism are implemented in various popular OOPLs. We conclude with the
observation that there is still lot of work to be done to reach a common ground for these crucial features
of OOPLs. This survey presents a detailed comparison of Java, C++, C# , Eiffel, Smalltalk, Ruby and
Python in terms of their inheritance and polymorphism implementations. The paper also presents a
compilation of the observations made by several earlier surveys [1, 27].
2 Introduction
There is a big variety of programming languages catering to various kinds of development require-
ments. Three of the main categories are procedural languages (e.g. C, Pascal, etc.), functional languages
(e.g. Haskel, Ocaml, etc.), and object-oriented programming languages (e.g. C++, Java, etc.). The
object-oriented design paradigm has been popular for quite some time owing its success to the powerful
features it offers for making program development easy and robust. OOPLs, such as C++ and Java,
offer an intuitive way of developing programs and provide powerful features for supporting the program
development. While languages like C can be used to develop programs that follow an object-oriented
design, the support of features such as inheritance, encapsulation, strong type support, exception han-
dling, etc. in the OOPLs make them more suitable for such development.
While the object-oriented programming paradigm provides a more intuitive way of programming, it
is also has complexities. This is due to the various complex features that the paradigm offers. OOPLs
differ widely in the way they implement features that are associated with the object design. For example,
some languages support multiple inheritance while some languages consider it a bad feature. In this sur-
vey we discuss the various features of object-oriented programs and how the languages we considered
1
(Java, C++, C# , Eiffel, Smalltalk, Ruby and Python) differ in implementing these features.
The survey is organized as follows. The Section 3 describes in detail the key concepts of OOPLs. Sec-
tion 4 presents a brief historical view of the OOPLs and gives a ...
Conceptual Interoperability and Biomedical DataJim McCusker
The goals of conceptual interoperability are:
Make similar but distinct data resources available for search, conversion, and inter-mapping in a way that mirrors human understanding of the data being searched.
Make data resources that use cross-cutting models (HL7-RIM, provenance models, etc.) interoperable with domain-specific models without explicit mappings between them.
The emergence in recent years of initiatives like the Linked Open Data (LOD) has led to a significant increase in the amount of structured semantic data on the Web. In this paper we argue that the shareability and wider reuse of such data can very often be hampered by the existence of vagueness within it, as this makes the data’s meaning less explicit. Moreover, as a way to reduce this problem,
we propose a vagueness metaontology that may represent in an explicit way the nature and characteristics of vague elements within semantic data.
Microposts Ontology Construction Via Concept Extraction dannyijwest
The social networking website Facebook offers to its users a feature called “status updates” (or just “status”), which allows users to create Microposts directed to all their contacts, or a subset thereof. Readers can respond to Microposts, or in addition to that also click a “Like” button to show their appreciation for a certain Micropost. Adding semantic meaning in the sense of unambiguous intended ideas to such Microposts. We can make a start towards semantic web by adding semantic annotation to web resources. Ontology are used to specify meaning of annotations. Ontology provide a vocabulary for representing and communicating knowledge about some topic and a set of semantic relationships that hold among the terms in that vocabulary. For increasing the efficiency of ontology based application there is a need to develop a mechanism that reduces the manual work in developing ontology. In this paper, we proposed Microposts’ ontology construction. In this paper we present a method that extracts meaningful knowledge from microposts shared in social platforms. This process involves different steps for the analysis of such microposts (extraction of keywords, named entities and their matching to ontological concepts).
Microposts Ontology Construction Via Concept Extraction dannyijwest
The social networking website Facebook offers to its users a feature called “status updates” (or just
“status”), which allows users to create Microposts directed to all their contacts, or a subset thereof.
Readers can respond to Microposts, or in addition to that also click a “Like” button to show their
appreciation for a certain Micropost. Adding semantic meaning in the sense of unambiguous intended ideas
to such Microposts. We can make a start towards semantic web by adding semantic annotation to web
resources. Ontology are used to specify meaning of annotations. Ontology provide a vocabulary for
representing and communicating knowledge about some topic and a set of semantic relationships that hold
among the terms in that vocabulary. For increasing the efficiency of ontology based application there is a
need to develop a mechanism that reduces the manual work in developing ontology. In this paper, we
proposed Microposts’ ontology construction. In this paper we present a method that extracts meaningful
knowledge from microposts shared in social platforms. This process involves different steps for the analysis
of such microposts (extraction of keywords, named entities and their matching to ontological concepts).
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
Although there is an emerging trend towards generating embeddings for primarily unstructured data and, recently, for structured data, no systematic suite for measuring the quality of embeddings has been proposed yet.
This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of the encoded structure as well as semantic patterns in the embedding space.
In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect.
Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings.
Furthermore, w.r.t. this framework, multiple experimental studies were run to compare the quality of the available embedding models.
Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts.
We positioned our sampled data and code at https://github.com/alshargi/Concept2vec under GNU General Public License v3.0.
Semantic Modeling for Information FederationCory Casanave
Semantic Modeling for Information Federation describes the UML profile and methodology for conceptual modeling and using conceptual reference models for federation and integration of information, systems and organizations.
This presentation contains both an introduction and detail appropriate for experienced architects.
In context-aware trust evaluation, using ontology tree is a popular approach to represent the relation
between contexts. Usually, similarity between two contexts is computed using these trees. Therefore, the
performance of trust evaluation highly depends on the quality of ontology trees. Fairness or granularity
consistency is one of the major limitations affecting the quality of ontology tree. This limitation refers to
inequality of semantic similarity in the most ontology trees. In other words, semantic similarity of every two
adjacent nodes is unequal in these trees. It deteriorates the performance of contexts similarity computation.
We overcome this limitation by weighting tree edges based on their semantic similarity. Weight of each
edge is computed using Normalized Similarity Score (NSS) method. This method is based on frequencies of
concepts (words) co-occurrences in the pages indexed by search engines. Our experiments represent the
better performance of the proposed approach in comparison with established trust evaluation approaches.
The suggested approach can enhance efficiency of any solution which models semantic relations by
ontology tree.
Trust Evaluation Using an Improved Context Similarity Measurementijbiss
In context-aware trust evaluation, using ontology tree is a popular approach to represent the relation between contexts. Usually, similarity between two contexts is computed using these trees. Therefore, the performance of trust evaluation highly depends on the quality of ontology trees. Fairness or granularity consistency is one of the major limitations affecting the quality of ontology tree. This limitation refers to inequality of semantic similarity in the most ontology trees. In other words, semantic similarity of every two adjacent nodes is unequal in these trees. It deteriorates the performance of contexts similarity computation. We overcome this limitation by weighting tree edges based on their semantic similarity. Weight of each
edge is computed using Normalized Similarity Score (NSS) method. This method is based on frequencies of concepts (words) co-occurrences in the pages indexed by search engines. Our experiments represent the better performance of the proposed approach in comparison with established trust evaluation approaches. The suggested approach can enhance efficiency of any solution which models semantic relations by
ontology tree.
study or concern about what kinds of things exist
what entities there are in the universe.
the ontology derives from the Greek onto (being) and logia (written or spoken). It is a branch of metaphysics , the study of first principles or the root of things.
Similar to Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation (20)
A set of practical strategies and techniques for tackling vagueness in data modeling and creating models that are semantically more accurate and interoperable.
Troubleshooting and Optimizing Named Entity Resolution Systems in the IndustryPanos Alexopoulos
Named Entity Resolution (NER) is an information extraction task that involves detecting mentions of named entities within texts and mapping them to
their corresponding entities in a given knowledge resource. Systems and frameworks for performing NER have been developed both by the academia and the industry with different features and capabilities. Nevertheless, what all approaches have in common is that their satisfactory performance in a given scenario does not constitute a trustworthy predictor of their performance in a different one, the reason being the scenario’s different characteristics (target entities, input texts, domain knowledge etc.). With that in mind, we describe a metric-based Diagnostic Framework that can be used to identify the causes behind the low performance of NER systems in industrial settings and take appropriate actions to increase it.
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven SummarizationPanos Alexopoulos
The emergence in the last years of initiatives like the Linked Open Data (LOD) has led to a significant increase of the amount of structured semantic data on the Web. Nevertheless, the wider reuse of such public semantic data is inhibited by the difficulty for users to decide whether a given dataset is actually suitable for their needs. This is because semantic datasets typically cover diverse domains, do not follow a unified way of organizing the knowledge and may differ in a number of dimensions. With that in mind, in this paper, we report our work in progress on a goal-driven dataset summarization approach that may facilitate better understanding and reuse-oriented evaluation of available semantic data.
The phenomenon of vagueness, manifested by terms and concepts like Tall, Red, Modern, etc., is quite common in human knowledge and it is related to our inability to precisely determine the extensions of such terms due to their blurred applicability boundaries. In the context of Ontologies and Semantic Web, vagueness is primarily treated by means of Fuzzy Ontologies, namely extensions of classical ontologies that apply truth degrees to vague ontological elements in an effort to quantify their vagueness and reason with it. Nevertheless, while a number of fuzzy conceptual formalisms and fuzzy ontology language extensions for representing vagueness in ontologies have been proposed by the community, the methodological issues entailed within the development process of such ontologies have been rather neglected. In this talk we position vagueness within the overall lifecycle of semantic information management and we present IKARUS-Onto, a methodology for engineering fuzzy ontologies that covers all typical ontology development stages, from specification to validation.
MUTUAL FUNDS (ICICI Prudential Mutual Fund) BY JAMES RODRIGUESWilliamRodrigues148
Mutual funds are investment vehicles that pool money from multiple investors to purchase a diversified portfolio of stocks, bonds, or other securities. They are managed by professional portfolio managers or investment companies who make investment decisions on behalf of the fund's investors.
World economy charts case study presented by a Big 4
World economy charts case study presented by a Big 4
World economy charts case
World economy charts case study presented by a Big 4
World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4
World economy charts case study presented by a Big 4
World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4World economy charts case study presented by a Big 4study presented by a Big 4
The E-Way Bill revolutionizes logistics by digitizing the documentation of goods transport, ensuring transparency, tax compliance, and streamlined processes. This mandatory, electronic system reduces delays, enhances accountability, and combats tax evasion, benefiting businesses and authorities alike. Embrace the E-Way Bill for efficient, reliable transportation operations.
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
1. Scenario-Driven Selection and Exploitation of
Semantic Data for Optimal Named Entity
Disambiguation
Panos Alexopoulos, Carlos Ruiz, Jose Manuel Gomez Perez
1st Semantic Web and Information Extraction Workshop,
Galway, Ireland, October 9th, 2012
2. Agenda
Introduction
Problem Definition and Paper
Focus
Approach Overview and Rationale
Proposed Disambiguation Framework
Disambiguation Evidence Model
Entity Disambiguation Process
Framework Evaluation
Evaluation Process
Evaluation Results
Conclusions and Future Work
2
3. Introduction
Problem Definition
Entity Resolution & Disambiguation
● Named entity resolution involves detecting mentions of named entities (e.g. people,
organizations or locations) within texts and mapping them to their corresponding
entities in a given knowledge source.
● One important challenge in this task is the correct disambiguation of the detected
entities.
● For example:
● “Siege of Tripolitsa took place in Tripoli with Theodoros Kolokotronis being the
leader of the Greeks. This event marked an early victory for the fight for
independence from Turkey but it was also a massacre against the Muslim and
Jewish population of the city”.
● The term “Tripoli” here refers to http://dbpedia.org/resource/Tripoli,_Greece
but it can be mistaken, for example, with Tripoli in Libya or that in Lebanon.
3
4. Introduction
Disambiguation Approaches
● The majority of disambiguation approaches rely on the strong contextual
hypothesis that terms with similar meanings are often used in similar contexts.
● The role of these contexts is typically played by already annotated documents (e.g.
wikipedia articles) which are used to train term classifiers.
● These classifiers link a term to its correct meaning entity, based on the similarity
between the term’s textual context and the contexts of its potential entities.
● Some more recent approaches utilize semantic structures in order to determine this
similarity in a semantic way.
● The effectiveness of these latter approaches is highly dependent on:
● The availability of comprehensive semantic information.
● The degree of alignment between the content of the texts to be
disambiguated and the semantic data to be used.
4
5. Introduction
Alignment and its Importance
● Alignment means that the ontology’s elements should cover the domain(s) of the
texts to be disambiguated but should not contain other additional elements that:
● Do not belong to the domain.
● Do belong to it but do not appear in the texts.
● For example assume the text “Ronaldo scored two goals for Real Madrid“ from a
contemporary football match description.
● To disambiguate the term “Ronaldo” using an ontology, the only contextual
evidence that can be used is the entity “Real Madrid”.
● Yet there are two players with that name that are semantically related to Real:
● Cristiano Ronaldo (current player)
● Ronaldo Luis Nazario de Lima (former player).
● This means that if both relations are considered then the term will not be
disambiguated.
5
6. Introduction
Towards Better Alignment
● In the previous example the fact that the text describes a contemporary football
match suggests that, in general, the relation between a team and its former players
is not expected to appear in it.
● Thus, for such texts, it would make sense to ignore this relation in order to achieve
more accurate disambiguation.
● Based on this observation, we make two claims:
● That there are certain scenarios where there is available a priori knowledge
about what entities and relations are expected to be present in the text.
● That this knowledge can be exploited for better alignment between semantic
information and content leading to more effective disambiguation.
● To verify these claims we define an entity disambiguation framework that can
perform better disambiguation in such scenarios.
6
7. Proposed Framework
Approach
● We target the task of entity disambiguation based on the intuition that a given
ontological entity is more likely to represent the meaning of an ambiguous term
when there are many ontologically related to it entities in the text.
● E.g. in the example text the entities “Siege of Tripolitsa” and “Theodoros
Kolokotronis” indicate that the term “Tripoli” refers to the city of Greece.
● These evidential entities are derived from one or more domain ontologies.
● However, which entities and to what extent may serve as evidence in a given
application scenario depends on the domain and expected content of the texts.
● For that, the key ability our framework provides to its users is to construct, in a
semi-automatic manner, semantic evidence models for specific disambiguation
scenarios and use them to perform entity disambiguation within them.
7
8. Proposed Framework
Framework Components
● A Disambiguation Evidence Model that contains the semantic entities that may
serve as disambiguation evidence for the scenario’s target entities in the given
scenario.
● Each pair of a target entity and an evidential one is accompanied by a degree
that quantifies the latter’s evidential power for the given target entity.
● A Disambiguation Evidence Model Construction Process that builds, in a semi-
automatic manner, a disambiguation evidence model for a given scenario.
● An Entity Disambiguation Process that uses the evidence model to detect and
extract from a given text terms that refer to the scenario’s target entities.
● Each term is linked to one or more possible entity uris along with a confidence
score calculated for each of them.
● The entity with the highest confidence should be the one the term actually refers
to.
8
9. Proposed Framework
Disambiguation Evidence Model
● Defines for each ontology entity which other instances and to what extent should be
used as evidence towards its correct meaning interpretation.
● It consists of entity pairs where a particular entity provides quantified evidence for a
another one.
9
10. Proposed Framework
Evidence Model Construction
● Construction of the evidence model depends on the characteristics of the domain and
the texts.
● The first step of the construction is manual and involves:
● The identification of the concepts whose instances we wish to disambiguate
(e.g. locations)
● The determination, for each of these concepts, of the related to them concepts
whose instances may serve as contextual disambiguation evidence/
● For example, in texts that describe historical events, some concepts
whose instances may act as location evidence are related locations,
historical events, and historical groups and persons.
● The identification, for each pair of evidence and target concept, of the relation
paths that links them.
10
11. Proposed Framework
Evidence Model Construction
● The result of this first step is a table like the following ones:
11
12. Proposed Framework
Evidence Model Construction
● Based on these tables, the second step of the construction is automatic and involves
the generation of the target-evidence entity pairs along with a disambiguation
evidential strength.
● This strength is inversely proportional to the number of different same-name target
entities a given evidential entity provides evidence for.
● For example, “Getafe” provides evidence for “Pedro Leon” to a strength of 0.5
because it has another player called Pedro.
12
13. Proposed Framework
Entity Resolution Process
● Step 1: We extract from the text the terms that possibly refer to the target entities as
well as those that refer to their respective evidential entities.
● Extraction is performed with Knowledge Tagger, an in-house tool based on
GATE.
● Step 2: Using the evidential entities we compute for each extracted target entity term
the confidence that it refers to a particular target entity.
● The target entity with the highest confidence is expected to be the correct one.
13
14. Proposed Framework
Disambiguation Example
Correct disambiguation
of the term Atletico
14
15. Framework Evaluation
Evaluation Process
Description
● Two disambiguation scenarios:
● Football match descriptions.
● Texts describing military conflicts.
● DBPedia as a source of semantic information in both cases.
● Disambiguation effectiveness measured through precision and recall.
● Evaluation results were compared to those achieved by two publicly available
semantic annotation and disambiguation systems;
● DBPedia Spotlight
● AIDA
● The two systems:
● Use also DBPedia as a knowledge source.
● Provide the users the capability to select the classes whose instances are to
be included in the process.
15
16. Framework Evaluation
Evaluation Results
Football Match Descriptions Scenario
● 50 texts describing football matches.
● E.g. “It's the 70th minute of the game and after a magnificent pass by Pedro, Messi
managed to beat Claudio Bravo. Barcelona now leads 1-0 against Real."
Disambiguation Results
16
17. Framework Evaluation
Evaluation Results
Military Conflict Texts Scenario
● 50 historical texts describing military conflicts.
● E.g. “The Siege of Augusta was a significant battle of the American Revolution.
Fought for control of Fort Cornwallis, a British fort near Augusta, the battle was a
major victory for the Patriot forces of Lighthorse Harry Lee and a stunning reverse to
the British and Loyalist forces in the South”.
Disambiguation Results
17
18. Conclusions and Future Work
Key Points
● We proposed a novel framework for optimizing named entity disambiguation in well-
defined and adequately constrained scenarios through the customized selection
and exploitation of semantic data.
● Our purpose was not to build another generic disambiguation system but rather a
reusable framework that can:
● Be relatively easily adapted to the particular characteristics of the domain and
application scenario at hand.
● Exploit these characteristics to increase the effectiveness of the
disambiguation process.
● The key aspect of the framework is the semi-automatic process it defines for
selecting the optimal evidence model for the scenario at hand.
18
19. Conclusions and Future Work
Key Points
● Comparative evaluation in two specific scenarios verified the framework’s
superiority over existing approaches that are designed to work in open domains and
unconstrained scenarios.
● This verified our hypothesis that the scenario adaptation capabilities of such generic
disambiguation systems can be inadequate in certain scenarios.
● Of course, the framework’s usability and effectiveness is directly proportional to the
content specificity of the texts to be disambiguated and the availability and
quality of a priori semantic knowledge about their content.
● The greater these two parameters are, the more applicable is our approach
and the more effective the disambiguation is expected to be.
● The opposite is true as the texts become more generic and the information we
have out about them more scarce.
19
20. Conclusions and Future Work
Framework Extensions
● Fully automated construction of the disambiguation evidence model.
● Challenge here is how to automatically identify the text’s domain/topic.
● Combination with statistical methods for cases where available domain semantic
information is incomplete.
● Challenge here is how to select the optimal ratio of ontological evidence v.s.
statistical one.
● Development of tool to enable users to dynamically build such models out of existing
semantic data and use them for disambiguation purposes
20
21. Thank you!
Contact iSOCO
Dr. Panos Alexopoulos
Senior Researcher
palexopoulos@isoco.com
Questions?
Barcelona Madrid Pamplona Valencia
Tel +34 935 677 200 Tel +34 913 349 797 Tel +34 948 102 408 Tel +34 963 467 143
Edificio Testa A Av. del Partenón, 16-18, 1º7ª Parque Tomás Oficina 107
C/ Alcalde Barnils, 64-68 Campo de las Naciones Caballero, 2, 6º-4ª C/ Prof. Beltrán Báguena, 4
St. Cugat del Vallès 28042 Madrid 31006 Pamplona 46009 Valencia
08174 Barcelona
21