This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of
ontology based data integration sy
stem. In this paper, we focus
o
n the web
query service to apply
this proposed algorithm
. Concepts similarity is important for web query service
because the words in user input query are not
same wholly with the concepts in
ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that canno
t be
confirmed the similarity of these words
by WordNet. We prove the effect
of this
algorithm with two degree semantic result of web minin
g by generating
the concepts results obtained form
the input query
There is a vast amount of unstructured Arabic information on the Web, this data is always organized in
semi-structured text and cannot be used directly. This research proposes a semi-supervised technique that
extracts binary relations between two Arabic named entities from the Web. Several works have been
performed for relation extraction from Latin texts and as far as we know, there isn’t any work for Arabic
text using a semi-supervised technique. The goal of this research is to extract a large list or table from
named entities and relations in a specific domain. A small set of a handful of instance relations are
required as input from the user. The system exploits summaries from Google search engine as a source
text. These instances are used to extract patterns. The output is a set of new entities and their relations. The
results from four experiments show that precision and recall varies according to relation type. Precision
ranges from 0.61 to 0.75 while recall ranges from 0.71 to 0.83. The best result is obtained for (player, club)
relationship, 0.72 and 0.83 for precision and recall respectively.
Semantic similarity and semantic relatedness
measure in particular is very important in the current scenario
due to the huge demand for natural language processing based
applications such as chatbots and information retrieval systems
such as knowledge base based FAQ systems. Current approaches
generally use similarity measures which does not use the context
sensitive relationships between the words. This leads to erroneous
similarity predictions and is not of much use in real life
applications. This work proposes a novel approach that gives an
accurate relatedness measure of any two words in a sentence by
taking their context into consideration. This context correction
results in a more accurate similarity prediction which results in
higher accuracy of information retrieval systems.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Theoretical work submitted to the Journal should be original in its motivation or modeling structure. Empirical analysis should be based on a theoretical framework and should be capable of replication. It is expected that all materials required for replication (including computer programs and data sets) should be available upon request to the authors.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
There is a vast amount of unstructured Arabic information on the Web, this data is always organized in
semi-structured text and cannot be used directly. This research proposes a semi-supervised technique that
extracts binary relations between two Arabic named entities from the Web. Several works have been
performed for relation extraction from Latin texts and as far as we know, there isn’t any work for Arabic
text using a semi-supervised technique. The goal of this research is to extract a large list or table from
named entities and relations in a specific domain. A small set of a handful of instance relations are
required as input from the user. The system exploits summaries from Google search engine as a source
text. These instances are used to extract patterns. The output is a set of new entities and their relations. The
results from four experiments show that precision and recall varies according to relation type. Precision
ranges from 0.61 to 0.75 while recall ranges from 0.71 to 0.83. The best result is obtained for (player, club)
relationship, 0.72 and 0.83 for precision and recall respectively.
Semantic similarity and semantic relatedness
measure in particular is very important in the current scenario
due to the huge demand for natural language processing based
applications such as chatbots and information retrieval systems
such as knowledge base based FAQ systems. Current approaches
generally use similarity measures which does not use the context
sensitive relationships between the words. This leads to erroneous
similarity predictions and is not of much use in real life
applications. This work proposes a novel approach that gives an
accurate relatedness measure of any two words in a sentence by
taking their context into consideration. This context correction
results in a more accurate similarity prediction which results in
higher accuracy of information retrieval systems.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Theoretical work submitted to the Journal should be original in its motivation or modeling structure. Empirical analysis should be based on a theoretical framework and should be capable of replication. It is expected that all materials required for replication (including computer programs and data sets) should be available upon request to the authors.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
Tracing Requirements as a Problem of Machine Learning ijseajournal
Software requirement engineering and evolution essential to software development process, which defines and elaborates what is to be built in a project. Requirements are mostly written in text and will later evolve to fine-grained and actionable artifacts with details about system configurations, technology stacks, etc. Tracing the evolution of requirements enables stakeholders to determine the origin of each requirement and
understand how well the software’s design reflects to its requirements. Reckoning requirements traceability
is not a trivial task, a machine learning approach is used to classify traceability between various associated requirements. In particular, a 2-learner, ontology-based, pseudo-instances-enhanced approach, where two classifiers are trained to separately exploit two types of features, lexical features and features derived from a hand-built ontology, is investigated for such task. The hand-built ontology is also leveraged to generate
pseudo training instances to improve machine learning results. In comparison to a supervised baseline system that uses only lexical features, our approach yields a relative error reduction of 56.0%. Most interestingly, results do not deteriorate when the hand-built ontology is replaced with its automatically
constructed counterpart.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
Enhancing Keyword Query Results Over Database for Improving User Satisfaction ijmpict
Storing data in relational databases is widely increasing to support keyword queries but search results does not gives effective answers to keyword query and hence it is inflexible from user perspective. It would be helpful to recognize such type of queries which gives results with low ranking. Here we estimate prediction of query performance to find out effectiveness of a search performed in response to query and features of such hard queries is studied by taking into account contents of the database and result list. One relevant problem of database is the presence of missing data and it can be handled by imputation. Here an inTeractive Retrieving-Inferring data imputation method (TRIP) is used which achieves retrieving and inferring alternately to fill the missing attribute values in the database. So by considering both the prediction of hard queries and imputation over the database, we can get better keyword search results.
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do
not contain the keywords but contain information related to the query are not retrieved. Spreading
activation is an algorithm for finding latent information in a query by exploiting relations between nodes in
an associative network or semantic network. However, the classical spreading activation algorithm uses all
relations of a node in the network that will add unsuitable information into the query. In this paper, we
propose a novel approach for semantic text search, called query-oriented-constrained spreading activation
that only uses relations relating to the content of the query to find really related information. Experiments
on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8%
respectively better than the syntactic search and the search using the classical constrained spreading
activation.
The computer is based on natural language on
summarization and machine system. It is very
difficult for human being manually summarize
large amount of text. The greatest challenge
for text summarization to summarize convent
from number of textual and semi structured
sources, including text , HTML page, portable
document file . This summarization can be
determine from internal and external measure.
Our proposed work sentence similarity based
text summarization using clusters help in
finding subjective question and answer on
internet . This work provide short units of text
that belong to similar information. I proposed
my work on sentence similarity-based
computation that helps to experiment for
similar text computation. Extractive
summarization text system choosing a subset of
the similar group from the text. proposal work I
used the part of speech, proper noun, verb,
pronouns such as he, she, and they etc. With
the help of part of speech we find important
sentence using a statistical method like proper
noun and sentence similarity system. It based
on internet information that contain picky
sentence.
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithmijnlc
Information Retrieval (IR) is a very important and vast area. While searching for context web returns all
the results related to the query. Identifying the relevant result is most tedious task for a user. Word Sense
Disambiguation (WSD) is the process of identifying the senses of word in textual context, when word has
multiple meanings. We have used the approaches of WSD. This paper presents a Proposed Dynamic Page
Rank algorithm that is improved version of Page Rank Algorithm. The Proposed Dynamic Page Rank
algorithm gives much better results than existing Google’s Page Rank algorithm. To prove this we have
calculated Reciprocal Rank for both the algorithms and presented comparative results.
Entity Annotation WordPress Plugin using TAGME TechnologyTELKOMNIKA JOURNAL
The development of internet technology makes more information can be accessed. It makes
information need to be organized in order to be easily managed. One solution can be used is by using the
entity annotation approach which generates tags to represent that document. In this study, TAGME
technology is implemented on a WordPress plugin, which is used to manage a blog. Moreover, information
on Wikipedia ‘Bahasa Indonesia’ is processed to generate an anchor dictionary which is required by the
technology that is implemented. This plugin performs entity annotation by giving tag suggestion for posts in
a blog. Testing is carried out by measuring the precision, recall, and of tag suggestions given by the
plugin. The result shows that the plugin can give tag suggestions with precision 0.7638, recall 0.5508, and
0.59.
Text document clustering and similarity detection is the major part of document management, where every document should be identified by its key terms and domain knowledge. Based on the similarity, the documents are grouped into clusters. For document similarity calculation there are several approaches were proposed in the existing system. But the existing system is either term based or pattern based. And those systems suffered from several problems. To make a revolution in this challenging environment, the proposed system presents an innovative model for document similarity by applying back propagation time stamp algorithm. It discovers patterns in text documents as higher level features and creates a network for fast grouping. It also detects the most appropriate patterns based on its weight and BPTT performs the document similarity measures. Using this approach, the document can be categorized easily. In order to perform the above, a new approach is used. This helps to reduce the training process problems. The above framework is named as BPTT. The BPTT has implemented and evaluated using dot net platform with different set of datasets.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...ijcseit
To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields
are fine tuning methodologies toward these goals that extracts the best of information relevant to the users
search query. In the proposed methodology discussed in this paper find ways to ease the search complexity
tackling the severe issues hindering the performance of traditional approaches in use. The proposed
methodology find effective means to find all possible semantic relatable frequent sets with FP Growth
algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal
attractive points for the web documents to get clustered meeting the requirement of the search query
without losing the relevance. On the whole the proposed system optimizes the objective function of
minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all
possible relationships with the search context intact. The major contribution being the system finds all
possible combinations matching the user search transaction and thereby making the system more
meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus
being unbiased and maintains a innate behaviour for any number of new additions to follow the herd
behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective
enhancements over the conventional approaches.
In Quarter Two 2011, Communicorp Digital commissioned Edison Research to conduct a nationally representative telephone survey of the Republic of Ireland consisting of 1000 people age 12 and older Data compared with February 2011 American “Infinite Dial” study. We asked more than one hundred questions. What follows is a ‘highlights tour’ of some key findings.
INFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERSZac Darcy
In this paper we compare distributions of concentrations of dopants in an implanted-junction rectifiers in a
heterostructures with an overlayer and without the overlayer. Conditions for decreasing of depth of the
considered p-n-junction have been formulated.
Tracing Requirements as a Problem of Machine Learning ijseajournal
Software requirement engineering and evolution essential to software development process, which defines and elaborates what is to be built in a project. Requirements are mostly written in text and will later evolve to fine-grained and actionable artifacts with details about system configurations, technology stacks, etc. Tracing the evolution of requirements enables stakeholders to determine the origin of each requirement and
understand how well the software’s design reflects to its requirements. Reckoning requirements traceability
is not a trivial task, a machine learning approach is used to classify traceability between various associated requirements. In particular, a 2-learner, ontology-based, pseudo-instances-enhanced approach, where two classifiers are trained to separately exploit two types of features, lexical features and features derived from a hand-built ontology, is investigated for such task. The hand-built ontology is also leveraged to generate
pseudo training instances to improve machine learning results. In comparison to a supervised baseline system that uses only lexical features, our approach yields a relative error reduction of 56.0%. Most interestingly, results do not deteriorate when the hand-built ontology is replaced with its automatically
constructed counterpart.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
Enhancing Keyword Query Results Over Database for Improving User Satisfaction ijmpict
Storing data in relational databases is widely increasing to support keyword queries but search results does not gives effective answers to keyword query and hence it is inflexible from user perspective. It would be helpful to recognize such type of queries which gives results with low ranking. Here we estimate prediction of query performance to find out effectiveness of a search performed in response to query and features of such hard queries is studied by taking into account contents of the database and result list. One relevant problem of database is the presence of missing data and it can be handled by imputation. Here an inTeractive Retrieving-Inferring data imputation method (TRIP) is used which achieves retrieving and inferring alternately to fill the missing attribute values in the database. So by considering both the prediction of hard queries and imputation over the database, we can get better keyword search results.
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do
not contain the keywords but contain information related to the query are not retrieved. Spreading
activation is an algorithm for finding latent information in a query by exploiting relations between nodes in
an associative network or semantic network. However, the classical spreading activation algorithm uses all
relations of a node in the network that will add unsuitable information into the query. In this paper, we
propose a novel approach for semantic text search, called query-oriented-constrained spreading activation
that only uses relations relating to the content of the query to find really related information. Experiments
on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8%
respectively better than the syntactic search and the search using the classical constrained spreading
activation.
The computer is based on natural language on
summarization and machine system. It is very
difficult for human being manually summarize
large amount of text. The greatest challenge
for text summarization to summarize convent
from number of textual and semi structured
sources, including text , HTML page, portable
document file . This summarization can be
determine from internal and external measure.
Our proposed work sentence similarity based
text summarization using clusters help in
finding subjective question and answer on
internet . This work provide short units of text
that belong to similar information. I proposed
my work on sentence similarity-based
computation that helps to experiment for
similar text computation. Extractive
summarization text system choosing a subset of
the similar group from the text. proposal work I
used the part of speech, proper noun, verb,
pronouns such as he, she, and they etc. With
the help of part of speech we find important
sentence using a statistical method like proper
noun and sentence similarity system. It based
on internet information that contain picky
sentence.
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithmijnlc
Information Retrieval (IR) is a very important and vast area. While searching for context web returns all
the results related to the query. Identifying the relevant result is most tedious task for a user. Word Sense
Disambiguation (WSD) is the process of identifying the senses of word in textual context, when word has
multiple meanings. We have used the approaches of WSD. This paper presents a Proposed Dynamic Page
Rank algorithm that is improved version of Page Rank Algorithm. The Proposed Dynamic Page Rank
algorithm gives much better results than existing Google’s Page Rank algorithm. To prove this we have
calculated Reciprocal Rank for both the algorithms and presented comparative results.
Entity Annotation WordPress Plugin using TAGME TechnologyTELKOMNIKA JOURNAL
The development of internet technology makes more information can be accessed. It makes
information need to be organized in order to be easily managed. One solution can be used is by using the
entity annotation approach which generates tags to represent that document. In this study, TAGME
technology is implemented on a WordPress plugin, which is used to manage a blog. Moreover, information
on Wikipedia ‘Bahasa Indonesia’ is processed to generate an anchor dictionary which is required by the
technology that is implemented. This plugin performs entity annotation by giving tag suggestion for posts in
a blog. Testing is carried out by measuring the precision, recall, and of tag suggestions given by the
plugin. The result shows that the plugin can give tag suggestions with precision 0.7638, recall 0.5508, and
0.59.
Text document clustering and similarity detection is the major part of document management, where every document should be identified by its key terms and domain knowledge. Based on the similarity, the documents are grouped into clusters. For document similarity calculation there are several approaches were proposed in the existing system. But the existing system is either term based or pattern based. And those systems suffered from several problems. To make a revolution in this challenging environment, the proposed system presents an innovative model for document similarity by applying back propagation time stamp algorithm. It discovers patterns in text documents as higher level features and creates a network for fast grouping. It also detects the most appropriate patterns based on its weight and BPTT performs the document similarity measures. Using this approach, the document can be categorized easily. In order to perform the above, a new approach is used. This helps to reduce the training process problems. The above framework is named as BPTT. The BPTT has implemented and evaluated using dot net platform with different set of datasets.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...ijcseit
To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields
are fine tuning methodologies toward these goals that extracts the best of information relevant to the users
search query. In the proposed methodology discussed in this paper find ways to ease the search complexity
tackling the severe issues hindering the performance of traditional approaches in use. The proposed
methodology find effective means to find all possible semantic relatable frequent sets with FP Growth
algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal
attractive points for the web documents to get clustered meeting the requirement of the search query
without losing the relevance. On the whole the proposed system optimizes the objective function of
minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all
possible relationships with the search context intact. The major contribution being the system finds all
possible combinations matching the user search transaction and thereby making the system more
meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus
being unbiased and maintains a innate behaviour for any number of new additions to follow the herd
behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective
enhancements over the conventional approaches.
In Quarter Two 2011, Communicorp Digital commissioned Edison Research to conduct a nationally representative telephone survey of the Republic of Ireland consisting of 1000 people age 12 and older Data compared with February 2011 American “Infinite Dial” study. We asked more than one hundred questions. What follows is a ‘highlights tour’ of some key findings.
INFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERSZac Darcy
In this paper we compare distributions of concentrations of dopants in an implanted-junction rectifiers in a
heterostructures with an overlayer and without the overlayer. Conditions for decreasing of depth of the
considered p-n-junction have been formulated.
2013. gada 6. jūnijā Rīgā norisinājās Finanšu ministrijas organizēta starptautiska konference „Fiskālās politikas perspektīvas Latvijā un Eiropas Savienībā”.
Konferencē diskutēja par fiskālās politikas reformām, kas ir viens no aktuālākajiem jautājumiem Eiropas politikas darba kārtībā. Konferences dalībnieki tika iepazīstināti arī ar pašreizējām tendencēm Baltijas valstu publiskajās finansēs un turpmākā darba prioritātēm.
A look at Qatar's Transformation, The Real Estate Perspective ...
GCC states on global stage;
Opportunity for emerging countries to enter new markets and geographies...
Buku ini berbeda dari versi yang lainnya hanya berisi bagaimana menggunakan LaTeX untuk musik, puisi, dan lagu tapi sayang nya tidak didukung not lagu hal ini dikarenakan fasilitas nya tidak mendukung
Computing semantic similarity between two words comes with variety of approaches. This is mainly
essential for the applications such as text analysis, text understanding. In traditional system search engines are used to compute the similarity between words. In that search engines are keyword based. There is one drawback that user should know what exactly they are looking for. There are mainly two main approaches for computation namely
knowledge based and corpus based approaches. But there is one drawback that these two approaches are not suitable for computing similarity between multi-word expressions. This system provides efficient and effective approach for computing term similarity using semantic network approach. A clustering approach is used in order to improve the
accuracy of the semantic similarity. This approach is more efficient than other computing algorithms. technique
can also apply to large scale dataset to compute term similarity.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
Document similarity is an important part of Natural Language Processing and is most commonly used for
plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity
algorithm could have a major positive impact on the field of Natural Language Processing. This report sets
out to examine the numerous document similarity algorithms, and determine which ones are the most
useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of
document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based
algorithms. The most effective algorithms in each category are also compared in our work using a series of
benchmark datasets and evaluations that test every possible area that each algorithm could be used in.
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
Document similarity is an important part of Natural Language Processing and is most commonly used for
plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity
algorithm could have a major positive impact on the field of Natural Language Processing. This report sets
out to examine the numerous document similarity algorithms, and determine which ones are the most
useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of
document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based
algorithms. The most effective algorithms in each category are also compared in our work using a series of
benchmark datasets and evaluations that test every possible area that each algorithm could be used in.
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
The keyword searching mechanism is traditionally used for information retrieval from Web based systems. However, this system fails to meet the requirements in Web searching of the expert knowledge base based on the popular semantic systems. Semantic search of E-learning documents based on ontology is increasingly adopted in information retrieval systems. Ontology based system simplifies the task of finding correct information on the Web by building a search system based on the meaning of keyword instead of the keyword itself. The major function of the ontology based system is the development of specification of conceptualization which enhances the connection between the information present in the Web pages with that of the background knowledge.The semantic gap existing between the keyword found in documents and those in query can be matched suitably using Ontology based system. This paper provides a detailed account of the semantic search of E-learning documents using ontology based system by making comparison between various ontology systems. Based on this comparison, this survey attempts to identify the possible directions for future research.
In this paper, we present three techniques for incorporating syntactic metadata in a textual retrieval system. The first technique involves just a syntactic analysis of the query and it generates a different weight for each term of the query, depending on its grammar category in the query phrase. These weights will be used for each term in the retrieval process. The second technique involves a storage optimization of the system's inverted index that is the inverse index will store only terms that are subjects or predicates in the document they appear in. Finally, the third technique builds a full syntactic index, meaning that for each term in the term collection, the inverse index stores besides the term-frequency and the inverse-document-frequency, also the grammar category of the term for each of its occurrences in a document.
This paper presents a natural language processing based automated system called DrawPlus for generating UML diagrams, user scenarios and test cases after analyzing the given business requirement specification which is written in natural language. The DrawPlus is presented for analyzing the natural languages and extracting the relative and required information from the given business requirement Specification by the user. Basically user writes the requirements specifications in simple English and the designed system has conspicuous ability to analyze the given requirement specification by using some of the core natural language processing techniques with our own well defined algorithms. After compound analysis and extraction of associated information, the DrawPlus system draws use case diagram, User scenarios and system level high level test case description. The DrawPlus provides the more convenient and reliable way of generating use case, user scenarios and test cases in a way reducing the time and cost of software development process while accelerating the 70 of works in Software design and Testing phase Janani Tharmaseelan ""Cohesive Software Design"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd22900.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/22900/cohesive-software-design/janani-tharmaseelan
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
Computing semantic similarity measure between words using web search enginecsandit
Semantic Similarity measures between words plays an important role in information retrieval,
natural language processing and in various tasks on the web. In this paper, we have proposed a
Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure
between the words by combining both page count method and web snippets method. Four
association measures are used to find semantic similarity between words in page count method
using web search engines. We use a Sequential Minimal Optimization (SMO) support vector
machines (SVM) to find the optimal combination of page counts-based similarity scores and
top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous
word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction
Algorithm outperforms by 89.8 percent of correlation value.
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...cscpconf
Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure
between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method
using web search engines. We use a Sequential Minimal Optimization (SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and
top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction
Algorithm outperforms by 89.8 percent of correlation value.
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
In the last decade, ontologies have played a key technology role for information sharing and agents interoperability in different application domains. In semantic web domain, ontologies are efficiently used toface the great challenge of representing the semantics of data, in order to bring the actual web to its full
power and hence, achieve its objective. However, using ontologies as common and shared vocabularies requires a certain degree of interoperability between them. To confront this requirement, mapping ontologies is a solution that is not to be avoided. In deed, ontology mapping build a meta layer that allows different applications and information systems to access and share their informations, of course, after resolving the different forms of syntactic, semantic and lexical mismatches. In the contribution presented in this paper, we have integrated the semantic aspect based on an external lexical resource, wordNet, to design a new algorithm for fully automatic ontology mapping. This fully automatic character features the
main difference of our contribution with regards to the most of the existing semi-automatic algorithms of ontology mapping, such as Chimaera, Prompt, Onion, Glue, etc. To better enhance the performances of our algorithm, the mapping discovery stage is based on the combination of two sub-modules. The former
analysis the concept’s names and the later analysis their properties. Each one of these two sub-modules is
it self based on the combination of lexical and semantic similarity measures.
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
Ontologisms have been applied to many applications in recent years, especially on Sematic Web, Information Retrieval, Information Extraction, and Question and Answer. The purpose of domain-specific ontology is to get rid of conceptual and terminological confusion. It accomplishes this by specifying a set of generic concepts that characterizes the domain as well as their definitions and interrelationships. This paper will describe some algorithms for identifying semantic relations and constructing an Information Technology Ontology, while extracting the concepts and objects from different sources. The Ontology is constructed based on three main resources: ACM, Wikipedia and unstructured files from ACM Digital Library. Our algorithms are combined of Natural Language Processing and Machine Learning. We use Natural Language Processing tools, such as OpenNLP, Stanford Lexical Dependency Parser in order to explore sentences. We then extract these sentences based on English pattern in order to build training set. We use a random sample among 245 categories of ACM to evaluate our results. Results generated show that our system yields superior performance.
Ontologisms have been applied to many applications in recent years, especially on Sematic Web, Information
Retrieval, Information Extraction, and Question and Answer. The purpose of domain-specific ontology
is to get rid of conceptual and terminological confusion. It accomplishes this by specifying a set of generic
concepts that characterizes the domain as well as their definitions and interrelationships. This paper will
describe some algorithms for identifying semantic relations and constructing an Information Technology
Ontology, while extracting the concepts and objects from different sources. The Ontology is constructed
based on three main resources: ACM, Wikipedia and unstructured files from ACM Digital Library. Our
algorithms are combined of Natural Language Processing and Machine Learning. We use Natural Language
Processing tools, such as OpenNLP, Stanford Lexical Dependency Parser in order to explore sentences.
We then extract these sentences based on English pattern in order to build training set. We use a
random sample among 245 categories of ACM to evaluate our results. Results generated show that our
system yields superior performance.
Query expansion using novel use case scenario relationship for finding featur...IJECEIAES
Feature location is a technique for determining source code that implements specific features in software. It developed to help minimize effort on program comprehension. The main challenge of feature location research is how to bridge the gap between abstract keywords in use cases and detail in source code. The use case scenarios are software requirements artifacts that state the input, logic, rules, actor, and output of a function in the software. The sentence on use case scenario is sometimes described another sentence in other use case scenario. This study contributes to creating expansion queries in feature locations by finding the relationship between use case scenarios. The relationships include inner association, outer association and intratoken association. The research employs latent Dirichlet allocation (LDA) to create model topics on source code. Query expansion using inner, outer and intratoken was tested for finding feature locations on a Java-based open-source project. The best precision rate was 50%. The best recall was 100%, which was found in several use case scenarios implemented in a few files. The best average precision rate was 16.7%, which was found in inner association experiments. The best average recall rate was 68.3%, which was found in all compound association experiments.
Similar to Conceptual similarity measurement algorithm for domain specific ontology[ (20)
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4
Conceptual similarity measurement algorithm for domain specific ontology[
1. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No.2, May 2014
DOI : 10.5121/ijitmc.2014.2204 29
CONCEPTUAL SIMILARITY MEASUREMENT ALGORITHM
FOR DOMAIN SPECIFIC ONTOLOGY
Zin Thu Thu Myint1
and Kay Khaing Win2
1
University of Technology, Yadanarpon Cyber City, Near Pyin Oo Lwin, Myanmar
2
Department of Advance Science and Technology, Nay Pyi Daw, Myanmar
ABSTRACT
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of ontology based data integration system. In this paper, we focus on the web
query service to apply this proposed algorithm. Concepts similarity is important for web query service
because the words in user input query are not same wholly with the concepts in ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that cannot be confirmed the similarity of these words by WordNet. We prove the effect of this
algorithm with two degree semantic result of web mining by generating the concepts results obtained form
the input query.
KEYWORDS
Semantic Similarity, Ontology, Concepts, Triplets & Query Processing
1. INTRODUCTION
Ontology based data integration system can support the virtual web portal to the users’ view
because users can get the uniform access to the multiple data sources that are located in separated
locations. By considering the architecture, it has three main processing phases that are ontology
creation, ontology mapping and web query service. In ontology creation, the expert at each local
source builds the local ontology by using their domain concepts. So, the domain concepts at each
local source may be various and it causes the semantic conflicts when integrating these local
ontologies [1, 2]. It is a reason to create the mapping rules that help to solve the semantic
conflicts among multiple ontologies. Mapping rules mean to construct the semantic relations such
as equivalent, hyponym and homonym among multiple ontologies [3, 4]. These rules are applied
not only to assist the integrated ontology to handle the semantic conflicts but also to help the web
query service by the matching the words contained in the user input query with the domain
concepts in ontology. However, these mapping rules are not sufficient to extract the terms relating
the domain concepts from the user input query because user can submit the query without
knowing exactly the domain concepts in ontology. The words in the input query may not be
wholly equivalent to the terms contained in the mapping rules. For this reason, other similarity
measurement methods become to add in term matching process of query service.
2. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No.2, May 2014
30
Query service in ontology based data integration system contains the triplets’ extraction, query
generation and retrieval of knowledge from the ontology. Similarity measurement is mainly
suppose the triplet extraction process that can extract the specific triplets from the user input
query and can add the necessary information to build the ontology understanding query such as
SPARQL [5, 6]. There are many similarity measurement methods to match the keywords in the
input query and the domain concepts in the ontology. However, they return the many possible
concepts that are similar to the input keywords and so; their precision and recall degrees are low
at query processing in ontology based data integration system [7, 8, and 9]. The proposed
semantic similarity algorithm can reduce the problem of retrieving unnecessary information from
the ontology by finding the most closet concepts in ontology that are similar to the terms in user
input query with the help of machine readable dictionary WordNet. This also gives the specific
triplets to suppose the query generation procedure. So, the precision and recall degree of this
proposed algorithm is very high.
This paper is organised as follows. In Section II, this paper presents the overview of ontology
based data integration system. The detail of this proposed similarity measurement algorithm is
described in Section III. Section IV will fully explain the experimental results based on the
precision and recall rates by generating the SPARQL query and retrieving the require information
from ontology. Finally, we conclude the presentation of this paper with some remarks in Section
V.
2. ONTOLOGY BASED DATA INTEGRATION SYSTEM
Using ontology in data integration systems is an ideal solution to handle the semantic conflicts
between various data sources [10]. There are two trends to use the ontology in data integration
system: one use for translating query or their result and the other uses ontology for the generation
of global schema [11]. The system presented in this paper uses both of these two trends for data
integration and accessing data on integrated ontology. The system architecture is depicted in
figure 1.
Figure 1. Ontology based data integration system
Figure 1. Ontology based data integration system
Words
Concept
s
Global Ontology
Mapping table
Local nLocal 2Local 1
Similarity measurement
WordNet
Detect Keywords/Names
Adjustabl
e
Similarity
Method
Edit
Distance
Method
Keyword
s
Names
Return concepts
Queries/Answers
End user
Queries/Answers
Send triplets
Triplet
Extraction
SPARQL
Generator
3. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No.2, May 2014
31
As architecture, the system follows the framework of Global As a View (GAV) approach [12].
Local ontology is firstly created to represent the relational structure of database at each local
source as the semantic model: table names are recognized as the ontology classes, column name
in each table are recognized as the data-type properties of each class that are defined for the
corresponding table and the between the classes are defined by the object-type properties. Global
ontology is built by using the data collected from the existing local ontology [13].
When users make queries and submit them to the system, the global ontology and mapping
schema are used to retrieve the information needed from the sources. Mapping rules mean to
construct equivalent, homonym and hyponym between the words in the input query and domain
ontology concepts [14, 15]. These mapping rules are constructed by referring to the semantic
similarity. Moreover, users’ input query may not be contained the terms that are wholly
equivalent to the concepts in ontology. So, it is needed to match the terms in input query with the
concepts in ontology by utilizing the proposed similarity measurement algorithm to suppose the
specific concepts in building the ontology understanding query such as SPARQL. This proposed
similarity measurement algorithm will be fully explained in the following section. SPARQL
query language has a graph-based structure and can be built by combining triple patterns
(subjects, predicates and objects). To fulfil the requirements in query generation procedure,
triplets extraction process can continually takes the necessary steps [16]. After achieving the
triplets from the input sentence, these are used to build the ontology understanding query
SPARQL and retrieve the required information from the ontology to reply to the user.
3. SEMANTIC SIMILARITY MEASUREMENT
For accessing the data on ontology, ontology understanding query such as SPARQL is needed.
According to the structure of SPARQL, it is needed to extract the specific triplets from the user
input query. Here, the terms contained in the user input query are not same exactly with the
concepts in ontology. So, similarity measurement algorithm is applied to match the terms
extracted from the user input query with the concepts in domain specific ontology. The proposed
algorithm is mainly based on the machine readable dictionary WordNet to define the forms of
each terms contained in the user input query. According to the senses such as words or names, it
uses the different similarity measurement methods to estimate the closest entities. This type
definition function in proposed algorithm is shown in the figure 2.
WordNet is the machine readable dictionary and it is widely used for confirming the semantic
relationships between overlapping domain concepts of ontology. In this system, WordNet is used
to discover the form of words (such as noun, adjective, name, etc.) that are contained as the
stream of words in the user input query. And then, the proposed algorithm finds the information
of the input words whether it is unknown or it has form. The rules for deciding on each word
according to its form are as follows.
• If the word has the unknown form, assumes these words as name stream and assigns null
in its form.
• If the word has the form, assigns this form type in its form.
After defining the form of words, it finds the similarity for these words with the concepts in
ontology by applying the GetSimilarity similarity measurement method and estimates the
similarity of naming streams by applying the EditDistance similarity measurement method. This
similarity estimation function is shown in figure 3.
4. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No.2, May 2014
32
Figure 2. Type definition function in proposed algorithm
As described in above, this proposed algorithm chooses the semantic similarity methods
according to the type of words return by the type definition function. So, it can estimate the closet
concepts in ontology by adjusting the threshold value of each similarity method respectively. The
following section (2.1 and 2.2) will explain the detail of these similarity measurement methods.
Figure 3. Similarity estimation function in proposed algorithm
3.1. Edit Distance Similarity Measurement Method
This method is used to estimate how many words are distant between the source and target
concepts. It can also estimate the distant degree for the stream of concepts containing space. This
method calculates how much distance based on the similarity matrix of two in put strings. Here,
it is used to compute the similarity between the words which have no meaning (e.g. the name of
the person. We fill each cell of first row in matrix with the word contained in a naming input
string and each cell of first column in matrix with the word contained in a naming concept stream.
5. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No.2, May 2014
33
The remaining cells are filled with the values obtained by applying the rule in equation 3. The
following recurrence relations define the edit distance, d(s1, s2), of two strings s1 and s2 [17].
d(ε, ε) = 0 // ε represents an empty string (1)
d(s, ε) = d(ε, s) = |s| // |s| is the length of string s (2)
d(s1–+c1, s2–+c2) = min( d(s1–, s2–) + p(c1, c2), d(s1–+c1, s2–) + 1, d(s1– , s2–+c2) + 1), (3)
where c1 and c2 are the last characters of s1 (= s1–+c1) and s2 (= s2–+c2) respectively, and
p(c1, c2) = 0 if c1 = c2; p(c1,c2) = 1, otherwise. The threshold value for Edit Distance similarity of
two concepts is defined as (distance < name-Length).
3.2. Get Similarity Measurement Method
This similarity measurement method estimates the similarity for each word that has the form
mainly dependent on the adjustable parameter . This addition of adjustable parameter in simple
similarity measurement equation can help to overcome the loss of information problem because
of the mismatch of one character in input word with the other one character in concept word (e.g.
organisation and organization). The similarity between the two concepts is calculated by using the
following equation.
( , ) = ( , ) (4)
where (1 ≤ ≤ ) is an adjustable parameter and is the number of characters in each
word. Moreover, = 1/ , which reflects the degree contributions to the overall semantic
similarity from to . ( , ) is respectively semantic similarity of each character
contained in a word. The threshold value for semantic similarity of two concepts is defined as 0.8.
We set this high threshold to obtain the closet elements from the ontology which are mostly
distant one character between the two words.
4. EXPERIMENTAL RESULT
In this section, we show the experimental results of proposed system that estimate the concepts’
similarity by applying the proposed semantic similarity algorithm. We build the local ontology
with the entities which contained at most twenty classes including main classes and subclasses
and twenty individual instances composed by more than 100 data-type properties. And then, we
create the global ontology by combining the two sample local ontologies. This data refers to staff
profile and history records of an organization. This proposed algorithm is applied to access the
data on global ontology. Here, we show the experimental results based on the two degree
semantic rates (precision and recall) by generating the concepts from user input query that will be
used for retrieving the required information from the domain specific ontology.
Applying the proposed semantic similarity measurement algorithm embedded in the triplets’
extraction approach on the sentences listed in below, it takes the process to extract the triplets
contained in the input string that are associated with the concept in ontology.
6. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No.2, May 2014
34
The testing queries are:
Query1: All about John. (n=3)
Query2: staff name at the software engineering department. (n=7)
Query3: Company’s names that are included in advance science and technology department.
(n=11)
Query4: name, age, NRC, father-name of staff who gets M.C.Sc degree and international paper
acceptance. (n=14)
Query5: Staff name, degree, position, start-year, department, compensation and bond-year who
get English exam marks > 50, Major exam marks >50 and had got Ph.D degree. (n=27)
We compare the two results of degree semantic rates (i.e. precision and recall) of the retrieving
concepts from the sentences listed in above by applying the proposed algorithm with the
retrieving concepts by applying the traditional semantic similarity measurement approach using
cosine similarity. This comparison is made by submitting the same queries which have the same
number of words count for each pair to access the data on the same global ontology. Here, the
precision and recall rates can be calculated by using the following equations [18].
=
ℎ
ℎ
(5)
=
ℎ
(6)
Figure 4. Prediction rates for different queries Figure 5. Recall rates for different queries
The illustration of figure 4 and figure 5 represents the obtaining precision and recall rates by
testing the five different queries listed in above. By seeing this appraisal, the proposed semantic
similarity measurement algorithm can take the preferable results.
The compared results of averaging precision and recall rates based on these five different queries
are shown in table 1.
8. International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No.2, May 2014
36
[11] F. Hakimpour and A. Geppert, “Relsoving Semantic Heterogenity in Schema Integration: an
Ontology Based Approach,” in Proc. of Conference on Formal Ontology in Information Systems,
Ogunquit, Maine, USA, October 17-19, 2001.
[12] P. Haase and B. Motik, “A Mapping System for the Integration of OWL-DL Ontologies,” IHIS’05,
Bremen, Germany, November 4, 2005.
[13] M. Uschold, Creating, Integrating and Maintaining Local and Global Ontologies, 2002.
[14] H. Wache, T. Vogele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hubener,
“Ontology-based data integration – A Survey of Existing Approaches,” in Proc. of IJCIA-01
Workshop: Ontologies and Information Sharing, Seattle, WA, 2001.
[15] J. Lu, Y. Zhang, Z. Miao, and P. Zhou, The Semantic Web Principle and Technology, Science Press:
Beijing, 2007.
[16] Zin Thu Thu Myint, "Triple Patterns Extraction from Unstructured Sentence Using Domain Specific
Ontology", proceeding of 11th International Conference on Computer Application, 2013.
[17] Christopher D. Manning Prabhakar Raghavan Hinrich Schütze, "An Introduction to Informational
Retrieval", Cambridge UP, 2009, pp. 58-59.
[18] Till Plumbaum, Tino Stelter and Alexander Korth, “Semanticn Web Usage Mining: Using Semantics
to Understand User Intentions”, inG-J,LNCS 5535, Houben, Eds,Herdelberg: German, 2009, pp. 391-
396.