Web has become the very first resource to search for any kind of information. With the emergence of semantic web, our search queries have started generating more informed results.Ontologies are at the core of any semantic web application. They help in rapid development of
distributed systems by providing information on the fly. This key feature of distribution and
sharing of information has made ontologies as a new knowledge representation mechanism. A
mechanism which is strongly backed by a sound inference system. In this paper, we shall discuss the development, verification and validation of an ontology in a health domain.
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSIJDKP
The biomedical research literature is one among many other domains that hides a precious knowledge, and
the biomedical community made an extensive use of this scientific literature to discover the facts of
biomedical entities, such as disease, drugs,etc.MEDLINE is a huge database of biomedical research
papers which remain a significantly underutilized source of biological information. Discovering the useful
knowledge from such huge corpus leads to various problems related to the type of information such as the
concepts related to the domain of texts and the semantic relationship associated with them. In this paper,
we propose a Two-level model for Self-supervised relation extraction from MEDLINE using Unified
Medical Language System (UMLS) Knowledge base. The model uses a Self-supervised Approach for
Relation Extraction (RE) by constructing enhanced training examples using information from UMLS. The
model shows a better result in comparison with current state of the art and naïve approaches
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
Information Retrieval on Text using Concept Similarityrahulmonikasharma
This document summarizes a research paper on concept-based information retrieval using semantic analysis and WordNet. It discusses some of the challenges with keyword-based retrieval, such as synonymy and polysemy problems. Concept-based retrieval aims to address these issues by mapping documents and queries to semantic concepts rather than keywords. The paper proposes extracting concepts from text documents using WordNet to identify synonyms, hypernyms and hyponyms. It involves calculating term frequencies to determine a hierarchy of important concepts. The methodology is implemented using Java and WordNet to extract concepts from sample input documents.
A Semantic Retrieval System for Extracting Relationships from Biological Corpusijcsit
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This
paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean
model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the
researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
Biomedical indexing and retrieval system based on language modeling approachijseajournal
This summarizes a research paper that proposes a biomedical indexing and retrieval system called BIOINSY. It uses a language modeling approach to select the best Medical Subject Headings (MeSH) descriptors to index medical articles from sources like PUBMED. The system first preprocesses articles by splitting text, stemming words, and removing stop words. It then extracts terms using a hybrid linguistic and statistical approach. Terms are weighted based on semantic relationships in MeSH, not just statistics. Descriptors are selected by disambiguating terms and estimating the probability a descriptor was generated by the article's language model. Experiments showed the effectiveness of this conceptual indexing approach.
This document discusses using a genetic algorithm to improve search visibility by expanding user queries. It explains that genetic algorithms can be applied to information retrieval by representing candidate solutions as chromosomes, evaluating their fitness, and evolving new generations through selection, crossover and mutation. The paper presents previous work applying genetic algorithms for query expansion and relevance feedback. It then describes the experiment conducted to implement a genetic algorithm over 500 generations to select optimal keywords for expanding queries and evaluate the approach on sample query results.
This document proposes a new method to re-rank web documents retrieved by search engines based on their relevance to a user's query using ontology concepts. It involves building an ontology of concepts for a given domain (electronic commerce), extracting concepts from retrieved documents, and re-ranking documents based on the frequency of ontology concepts within them. An evaluation showed the approach reduced average ranking error compared to search engines alone. The method was tested on the first 30 documents retrieved for the query "e-commerce" from search engines.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSIJDKP
The biomedical research literature is one among many other domains that hides a precious knowledge, and
the biomedical community made an extensive use of this scientific literature to discover the facts of
biomedical entities, such as disease, drugs,etc.MEDLINE is a huge database of biomedical research
papers which remain a significantly underutilized source of biological information. Discovering the useful
knowledge from such huge corpus leads to various problems related to the type of information such as the
concepts related to the domain of texts and the semantic relationship associated with them. In this paper,
we propose a Two-level model for Self-supervised relation extraction from MEDLINE using Unified
Medical Language System (UMLS) Knowledge base. The model uses a Self-supervised Approach for
Relation Extraction (RE) by constructing enhanced training examples using information from UMLS. The
model shows a better result in comparison with current state of the art and naïve approaches
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
Information Retrieval on Text using Concept Similarityrahulmonikasharma
This document summarizes a research paper on concept-based information retrieval using semantic analysis and WordNet. It discusses some of the challenges with keyword-based retrieval, such as synonymy and polysemy problems. Concept-based retrieval aims to address these issues by mapping documents and queries to semantic concepts rather than keywords. The paper proposes extracting concepts from text documents using WordNet to identify synonyms, hypernyms and hyponyms. It involves calculating term frequencies to determine a hierarchy of important concepts. The methodology is implemented using Java and WordNet to extract concepts from sample input documents.
A Semantic Retrieval System for Extracting Relationships from Biological Corpusijcsit
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This
paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean
model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the
researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
Biomedical indexing and retrieval system based on language modeling approachijseajournal
This summarizes a research paper that proposes a biomedical indexing and retrieval system called BIOINSY. It uses a language modeling approach to select the best Medical Subject Headings (MeSH) descriptors to index medical articles from sources like PUBMED. The system first preprocesses articles by splitting text, stemming words, and removing stop words. It then extracts terms using a hybrid linguistic and statistical approach. Terms are weighted based on semantic relationships in MeSH, not just statistics. Descriptors are selected by disambiguating terms and estimating the probability a descriptor was generated by the article's language model. Experiments showed the effectiveness of this conceptual indexing approach.
This document discusses using a genetic algorithm to improve search visibility by expanding user queries. It explains that genetic algorithms can be applied to information retrieval by representing candidate solutions as chromosomes, evaluating their fitness, and evolving new generations through selection, crossover and mutation. The paper presents previous work applying genetic algorithms for query expansion and relevance feedback. It then describes the experiment conducted to implement a genetic algorithm over 500 generations to select optimal keywords for expanding queries and evaluate the approach on sample query results.
This document proposes a new method to re-rank web documents retrieved by search engines based on their relevance to a user's query using ontology concepts. It involves building an ontology of concepts for a given domain (electronic commerce), extracting concepts from retrieved documents, and re-ranking documents based on the frequency of ontology concepts within them. An evaluation showed the approach reduced average ranking error compared to search engines alone. The method was tested on the first 30 documents retrieved for the query "e-commerce" from search engines.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
This document discusses the use of fuzzy queries to retrieve information from databases. Fuzzy queries allow for imprecise or vague terms to be used in queries, similar to natural language. The document first provides background on limitations of traditional database queries. It then discusses how fuzzy set theory and membership functions can be applied to queries and data to handle uncertain terms. The proposed approach applies fuzzy queries to a relational database, defining linguistic variables and membership functions. This allows information to be retrieved based on fuzzy criteria and improves the ability to query databases using human-like terms. Benefits of fuzzy queries include more natural interaction and accounting for real-world data imperfections.
An Overview on the Use of Data Mining and Linguistics Techniques for Building...ijcsit
The usage of Online Social Networks (OSN), such as Facebook and Twitter are becoming more and more
popular in order to exchange and disseminate news and information in real-time. Twitter in particular
allows the instant dissemination of short messages in the form of microblogs to followers. This Survey
reviews literature to explore and examine the usage of how OSNs, such as the microblogging tool Twitter,
can help in the detection of spreading epidemics. The paper highlights significant challenges in the field of
Natural Language Processing (NLP) when using microblog based Early Disease Detection Systems. For
instance, microblogging data is an unstructured collection of short messages (140 characters in Twitter),
with noise and non-standard use of the English language. Hence, research is currently exploring the field
of linguistics in order to determine the semantics of the text and uses data mining techniques in order to
extract useful information for disease spread detection. Furthermore, the survey discusses applications and
existing early disease detection systems based on OSNs and outlines directions for future research on
improving such systems based on a combination of linguistics methods, data mining techniques and
recommendation systems.
Cluster Based Web Search Using Support Vector MachineCSCJournals
Now days, searches for the web pages of a person with a given name constitute a notable fraction of queries to Web search engines. This method exploits a variety of semantic information extracted from web pages. The rapid growth of the Internet has made the Web a popular place for collecting information. Today, Internet user access billions of web pages online using search engines. Information in the Web comes from many sources, including websites of companies, organizations, communications and personal homepages, etc. Effective representation of Web search results remains an open problem in the Information Retrieval community. For ambiguous queries, a traditional approach is to organize search results into groups (clusters), one for each meaning of the query. These groups are usually constructed according to the topical similarity of the retrieved documents, but it is possible for documents to be totally dissimilar and still correspond to the same meaning of the query. To overcome this problem, the relevant Web pages are often located close to each other in the Web graph of hyperlinks. It presents a graphical approach for entity resolution & complements the traditional methodology with the analysis of the entity-relationship (ER) graph constructed for the dataset being analyzed. It also demonstrates a technique that measures the degree of interconnectedness between various pairs of nodes in the graph. It can significantly improve the quality of entity resolution. Using Support vector machines (SVMs) which are a set of related Supervised learning methods used for classification of load of user queries to the sever machine to different client machines so that system will be stable. clusters web pages based on their capacities stores whole database on server machine. Keywords: SVM, cluster; ER.
An improved technique for ranking semantic associationst07IJwest
The primary focus of the search techniques in the first generation of the Web is accessing relevant
documents from the Web. Though it satisfies user requirements, but it is insufficient as the user sometimes
wishes to access actionable information involvin
g complex relationships between two given entities.
Finding such complex relationships (also known as semantic associations) is especially useful in
applications
such as
National Security, Pharmacy, Business Intelligence etc. Therefore the next frontier is
discovering relevant semantic associations between two entities present in large semantic metadata
repositories. Given two entities, there exist a huge number of semantic associations between two entities.
Hence ranking of these associations is required i
n order to find more relevant associations. For this
Aleman Meza et al. proposed a method involving six metrics viz. context, subsumption, rarity, popularity,
association length and trust. To compute the overall rank of the associations this method compute
s
context, subsumption, rarity and popularity values for each component of the association and for all the
associations. However it is obvious that, many components appears repeatedly in many associations
therefore it is not necessary to compute context, s
ubsumption, rarity
,
popularity
,
and
trust
values of the
components every time for each association rather the previously computed values may be used while
computing the overall rank of the associations. Thi
s paper proposes a method to re
use the previously
computed values using a hash data structure thus reduce the execution time. To demonstrate the
effectiveness of the proposed method, experiments were conducted on SWETO ontology. Results show
that the proposed method is more efficient than the other existi
ng methods
.
The web has become a resourceful tool for almost all domains today. Search engines prominently use
inverted indexing technique to locate the web pages having the users query. The performance of inverted
index fundamentally depends upon the searching of keyword in the list maintained by search engine. Text
matching is done with the help of string matching algorithm. It is important to any string matching
algorithm to locate quickly the occurrences of the user specified pattern in large text. In this paper a new
string matching algorithm for keyword searching is proposed. The proposed algorithm relies on new
technique based on pattern length and FML (First-Middle-Last) character match. This proposed
algorithm is analysed and implemented. The extensive testing and comparisons are done with BoyerMoore, Naïve, Improved Naïve, Horspool and Zhu Takaoka. The result shows that the proposed
algorithm takes less time than other existing algorithm.
IRJET - Detection of Drug Abuse using Social Media MiningIRJET Journal
This document discusses a study that aims to detect drug abuse patterns by mining social media data, specifically from Instagram. The study scrapes Instagram posts containing drug-related hashtags and analyzes the text and images. Text analysis identifies frequent drug slang and collocations between hashtags. Posts are filtered based on hashtags to collect information. A list of drug-related hashtags is generated. The model aims to dynamically detect illegal drug posts through hashtags and learn drug user lingo to study behavioral patterns and identify other potentially involved user accounts and locations.
Intelligent Semantic Web Search Engines: A Brief Survey dannyijwest
The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The amount of information grows billions of databases. We need to search the information will specialize tools known generically search engine. There are many of search engines available today, retrieving meaningful information is difficult. However to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper we present survey on the search engine generations and the role of search engines in intelligent web and semantic search technologies.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Wikipedia as an Ontology for Describing DocumentsZareen Syed
The document presents an approach called "Wikitology" that uses Wikipedia as an ontology for describing and summarizing documents. It outlines methods for mapping documents to Wikipedia articles and categories using similarity metrics and spreading activation across the Wikipedia link graph. The authors conducted experiments predicting concepts for single and multiple documents and evaluated accuracy by comparing to human judgments. They discuss applications like document expansion and future work improving the methods and exploiting the Wikipedia structure.
This document presents the development of a Web Access Literacy Scale to measure users' abilities to critically evaluate information found online. The researchers conducted a study with 534 participants to develop and validate the scale. Factor analysis resulted in a 7-factor scale measuring logical approach, content verification strategies, inquisitiveness, bias tolerance, search skills, author verification, and objectivity. Scores were higher for those with information literacy experience. The scale can help identify weaknesses and inform the development of literacy training and search tools.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
This study was presented in the 20th International Conference on Web Information Systems Engineering (WISE 2019), 19-22 January 2020.
-----
Fumiaki Saito, Yoshiyuki Shoji and Yusuke YamamotoHighlighting Weasel Sentences for Promoting Critical Information Seeking on the WebProceedings of the 20th International Conference on Web Information Systems Engineering (WISE 2019), pp.424-440, Hong Kong, China, November 2019.
Correlation Coefficient Based Average Textual Similarity Model for Informatio...IOSR Journals
The document presents a proposed model for a textual similarity approach for information retrieval systems in wide area networks. It evaluates the performance of four similarity functions (Jaccard, Cosine, Dice, Overlap) using correlation coefficients. Three approaches are proposed: 1) Combining Cosine and Overlap similarity scores, which performed best. 2) Combining Cosine, Dice, and Overlap scores. 3) Combining all four similarity functions. The model is represented as a triangle where the vertices are the results from the three proposed approaches to measure textual similarity between retrieved documents.
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGijaia
In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific
taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency
method for building the taxonomy, while the second involves the bootstrapping of a Word2Vec based
algorithm with a directed crawler. We exploit the multilingual open-content directory of the World Wide
Web, DMOZ1
to seed the crawl, and the domain name to direct the crawl. This domain corpus is then input
to our algorithm that can automatically induce taxonomies. The induced taxonomies provide hierarchical
semantic dimensions for the purposes of faceted browsing. As part of an ongoing personal semantics
project, we applied the resulting taxonomies to personal social media data (Twitter, Gmail, Facebook,
Instagram, Flickr) with an objective of enhancing an individual’s exploration of their personal information
through faceted searching. We also perform a comprehensive corpus based evaluation of the algorithms
based on many datasets drawn from the fields of medicine (diseases) and leisure (hobbies) and show that
the induced taxonomies are of high quality.
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Ontology oriented concept based clusteringeSAT Journals
Abstract Worldwide health centre scientists, physicians and other patients are accessing, analyzing, integrating and storing massive amounts of digital medical data in different database. The potential for retrieval of information is vast and daunting. The objective of our approach is to differentiate relevant information from irrelevant through user friendly and efficient search algorithms. The traditional solution employs keyword based search without the semantic consideration. So the keyword retrieval may return inaccurate and incomplete results. In order to overcome the problem of information retrieval from this huge amount of database, there is a need for concept based clustering method in ontology. In the proposed method, WorldNet is integrated in order to match the synonyms for the identified keywords so as to obtain the accurate information and it presents the concept based clustering developed using k-means algorithm in accordance with the principles of ontology so that the importance of words of a cluster can be identified. Keywords: Ontology, Concept based clustering, K-means algorithm and information retrieval.
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
This document summarizes a research paper that proposes an improved method for mining biomedical data from web documents using clustering. Specifically, it develops an optimized k-means clustering algorithm to group similar biomedical documents together based on identifying relevant terms using the Unified Medical Language System (UMLS). The approach aims to more efficiently retrieve relevant biomedical documents for users. It compares the proposed method to the original k-means algorithm and finds it achieves an average F-measure of 99.06%, indicating more accurate clustering of biomedical web documents.
How to conduct_a_systematic_or_evidence_reviewEaglefly Fly
This document provides guidance on conducting a systematic or evidence-based literature review. It discusses defining search terms, identifying relevant articles through database searches and other methods, applying inclusion/exclusion filters to evaluate articles, synthesizing results, and summarizing the evidence found to determine the best intervention. The goal is to reduce bias and provide a comprehensive review of a topic through an explicit and transparent process.
This document discusses the use of fuzzy queries to retrieve information from databases. Fuzzy queries allow for imprecise or vague terms to be used in queries, similar to natural language. The document first provides background on limitations of traditional database queries. It then discusses how fuzzy set theory and membership functions can be applied to queries and data to handle uncertain terms. The proposed approach applies fuzzy queries to a relational database, defining linguistic variables and membership functions. This allows information to be retrieved based on fuzzy criteria and improves the ability to query databases using human-like terms. Benefits of fuzzy queries include more natural interaction and accounting for real-world data imperfections.
An Overview on the Use of Data Mining and Linguistics Techniques for Building...ijcsit
The usage of Online Social Networks (OSN), such as Facebook and Twitter are becoming more and more
popular in order to exchange and disseminate news and information in real-time. Twitter in particular
allows the instant dissemination of short messages in the form of microblogs to followers. This Survey
reviews literature to explore and examine the usage of how OSNs, such as the microblogging tool Twitter,
can help in the detection of spreading epidemics. The paper highlights significant challenges in the field of
Natural Language Processing (NLP) when using microblog based Early Disease Detection Systems. For
instance, microblogging data is an unstructured collection of short messages (140 characters in Twitter),
with noise and non-standard use of the English language. Hence, research is currently exploring the field
of linguistics in order to determine the semantics of the text and uses data mining techniques in order to
extract useful information for disease spread detection. Furthermore, the survey discusses applications and
existing early disease detection systems based on OSNs and outlines directions for future research on
improving such systems based on a combination of linguistics methods, data mining techniques and
recommendation systems.
Cluster Based Web Search Using Support Vector MachineCSCJournals
Now days, searches for the web pages of a person with a given name constitute a notable fraction of queries to Web search engines. This method exploits a variety of semantic information extracted from web pages. The rapid growth of the Internet has made the Web a popular place for collecting information. Today, Internet user access billions of web pages online using search engines. Information in the Web comes from many sources, including websites of companies, organizations, communications and personal homepages, etc. Effective representation of Web search results remains an open problem in the Information Retrieval community. For ambiguous queries, a traditional approach is to organize search results into groups (clusters), one for each meaning of the query. These groups are usually constructed according to the topical similarity of the retrieved documents, but it is possible for documents to be totally dissimilar and still correspond to the same meaning of the query. To overcome this problem, the relevant Web pages are often located close to each other in the Web graph of hyperlinks. It presents a graphical approach for entity resolution & complements the traditional methodology with the analysis of the entity-relationship (ER) graph constructed for the dataset being analyzed. It also demonstrates a technique that measures the degree of interconnectedness between various pairs of nodes in the graph. It can significantly improve the quality of entity resolution. Using Support vector machines (SVMs) which are a set of related Supervised learning methods used for classification of load of user queries to the sever machine to different client machines so that system will be stable. clusters web pages based on their capacities stores whole database on server machine. Keywords: SVM, cluster; ER.
An improved technique for ranking semantic associationst07IJwest
The primary focus of the search techniques in the first generation of the Web is accessing relevant
documents from the Web. Though it satisfies user requirements, but it is insufficient as the user sometimes
wishes to access actionable information involvin
g complex relationships between two given entities.
Finding such complex relationships (also known as semantic associations) is especially useful in
applications
such as
National Security, Pharmacy, Business Intelligence etc. Therefore the next frontier is
discovering relevant semantic associations between two entities present in large semantic metadata
repositories. Given two entities, there exist a huge number of semantic associations between two entities.
Hence ranking of these associations is required i
n order to find more relevant associations. For this
Aleman Meza et al. proposed a method involving six metrics viz. context, subsumption, rarity, popularity,
association length and trust. To compute the overall rank of the associations this method compute
s
context, subsumption, rarity and popularity values for each component of the association and for all the
associations. However it is obvious that, many components appears repeatedly in many associations
therefore it is not necessary to compute context, s
ubsumption, rarity
,
popularity
,
and
trust
values of the
components every time for each association rather the previously computed values may be used while
computing the overall rank of the associations. Thi
s paper proposes a method to re
use the previously
computed values using a hash data structure thus reduce the execution time. To demonstrate the
effectiveness of the proposed method, experiments were conducted on SWETO ontology. Results show
that the proposed method is more efficient than the other existi
ng methods
.
The web has become a resourceful tool for almost all domains today. Search engines prominently use
inverted indexing technique to locate the web pages having the users query. The performance of inverted
index fundamentally depends upon the searching of keyword in the list maintained by search engine. Text
matching is done with the help of string matching algorithm. It is important to any string matching
algorithm to locate quickly the occurrences of the user specified pattern in large text. In this paper a new
string matching algorithm for keyword searching is proposed. The proposed algorithm relies on new
technique based on pattern length and FML (First-Middle-Last) character match. This proposed
algorithm is analysed and implemented. The extensive testing and comparisons are done with BoyerMoore, Naïve, Improved Naïve, Horspool and Zhu Takaoka. The result shows that the proposed
algorithm takes less time than other existing algorithm.
IRJET - Detection of Drug Abuse using Social Media MiningIRJET Journal
This document discusses a study that aims to detect drug abuse patterns by mining social media data, specifically from Instagram. The study scrapes Instagram posts containing drug-related hashtags and analyzes the text and images. Text analysis identifies frequent drug slang and collocations between hashtags. Posts are filtered based on hashtags to collect information. A list of drug-related hashtags is generated. The model aims to dynamically detect illegal drug posts through hashtags and learn drug user lingo to study behavioral patterns and identify other potentially involved user accounts and locations.
Intelligent Semantic Web Search Engines: A Brief Survey dannyijwest
The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The amount of information grows billions of databases. We need to search the information will specialize tools known generically search engine. There are many of search engines available today, retrieving meaningful information is difficult. However to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper we present survey on the search engine generations and the role of search engines in intelligent web and semantic search technologies.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Wikipedia as an Ontology for Describing DocumentsZareen Syed
The document presents an approach called "Wikitology" that uses Wikipedia as an ontology for describing and summarizing documents. It outlines methods for mapping documents to Wikipedia articles and categories using similarity metrics and spreading activation across the Wikipedia link graph. The authors conducted experiments predicting concepts for single and multiple documents and evaluated accuracy by comparing to human judgments. They discuss applications like document expansion and future work improving the methods and exploiting the Wikipedia structure.
This document presents the development of a Web Access Literacy Scale to measure users' abilities to critically evaluate information found online. The researchers conducted a study with 534 participants to develop and validate the scale. Factor analysis resulted in a 7-factor scale measuring logical approach, content verification strategies, inquisitiveness, bias tolerance, search skills, author verification, and objectivity. Scores were higher for those with information literacy experience. The scale can help identify weaknesses and inform the development of literacy training and search tools.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
This study was presented in the 20th International Conference on Web Information Systems Engineering (WISE 2019), 19-22 January 2020.
-----
Fumiaki Saito, Yoshiyuki Shoji and Yusuke YamamotoHighlighting Weasel Sentences for Promoting Critical Information Seeking on the WebProceedings of the 20th International Conference on Web Information Systems Engineering (WISE 2019), pp.424-440, Hong Kong, China, November 2019.
Correlation Coefficient Based Average Textual Similarity Model for Informatio...IOSR Journals
The document presents a proposed model for a textual similarity approach for information retrieval systems in wide area networks. It evaluates the performance of four similarity functions (Jaccard, Cosine, Dice, Overlap) using correlation coefficients. Three approaches are proposed: 1) Combining Cosine and Overlap similarity scores, which performed best. 2) Combining Cosine, Dice, and Overlap scores. 3) Combining all four similarity functions. The model is represented as a triangle where the vertices are the results from the three proposed approaches to measure textual similarity between retrieved documents.
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGijaia
In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific
taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency
method for building the taxonomy, while the second involves the bootstrapping of a Word2Vec based
algorithm with a directed crawler. We exploit the multilingual open-content directory of the World Wide
Web, DMOZ1
to seed the crawl, and the domain name to direct the crawl. This domain corpus is then input
to our algorithm that can automatically induce taxonomies. The induced taxonomies provide hierarchical
semantic dimensions for the purposes of faceted browsing. As part of an ongoing personal semantics
project, we applied the resulting taxonomies to personal social media data (Twitter, Gmail, Facebook,
Instagram, Flickr) with an objective of enhancing an individual’s exploration of their personal information
through faceted searching. We also perform a comprehensive corpus based evaluation of the algorithms
based on many datasets drawn from the fields of medicine (diseases) and leisure (hobbies) and show that
the induced taxonomies are of high quality.
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Ontology oriented concept based clusteringeSAT Journals
Abstract Worldwide health centre scientists, physicians and other patients are accessing, analyzing, integrating and storing massive amounts of digital medical data in different database. The potential for retrieval of information is vast and daunting. The objective of our approach is to differentiate relevant information from irrelevant through user friendly and efficient search algorithms. The traditional solution employs keyword based search without the semantic consideration. So the keyword retrieval may return inaccurate and incomplete results. In order to overcome the problem of information retrieval from this huge amount of database, there is a need for concept based clustering method in ontology. In the proposed method, WorldNet is integrated in order to match the synonyms for the identified keywords so as to obtain the accurate information and it presents the concept based clustering developed using k-means algorithm in accordance with the principles of ontology so that the importance of words of a cluster can be identified. Keywords: Ontology, Concept based clustering, K-means algorithm and information retrieval.
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
This document summarizes a research paper that proposes an improved method for mining biomedical data from web documents using clustering. Specifically, it develops an optimized k-means clustering algorithm to group similar biomedical documents together based on identifying relevant terms using the Unified Medical Language System (UMLS). The approach aims to more efficiently retrieve relevant biomedical documents for users. It compares the proposed method to the original k-means algorithm and finds it achieves an average F-measure of 99.06%, indicating more accurate clustering of biomedical web documents.
How to conduct_a_systematic_or_evidence_reviewEaglefly Fly
This document provides guidance on conducting a systematic or evidence-based literature review. It discusses defining search terms, identifying relevant articles through database searches and other methods, applying inclusion/exclusion filters to evaluate articles, synthesizing results, and summarizing the evidence found to determine the best intervention. The goal is to reduce bias and provide a comprehensive review of a topic through an explicit and transparent process.
Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...ijcsa
This document summarizes a research paper that introduces a text mining-based method for answering biological queries and testing hypotheses. The proposed approach analyzes hypotheses stated as natural language questions and measures their statistical significance based on existing literature. It computes a p-value to determine whether to accept or reject each hypothesis. The method also generates a network of related biological entities to provide context and suggest new hypotheses for further investigation. The goal is to help researchers quantitatively evaluate assumptions and guide relevant discovery of new biological knowledge.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes a research paper that proposes a machine learning approach to identify disease-treatment relationships from biomedical text. It extracts sentences mentioning diseases and treatments from medical publications and classifies the semantic relationships between them. The researchers evaluate their methodology on a dataset of sentences annotated with diseases, treatments and their relationships. Their results show the machine learning models can reliably extract this information and outperform previous methods on the same data. The proposed approach could be integrated into applications to disseminate healthcare information from published literature to medical professionals and patients.
Search Interface Feature Evaluation in BiosciencesZanda Mark
Read more here: http://pingar.com/
This paper reports findings on desirable interface features for different
search tasks in the biomedical domain. We conducted a user study where
we asked bioscientists to evaluate the usefulness of autocomplete, query
expansions, faceted refinement, related searches and results preview
implementations in new pilot interfaces and publicly available systems
while using baseline and their own queries. Our evaluation reveals that
there is a preference for certain features depending on the search task.
In addition, we touch on the current pain point of faceted search: the
acquisition of faceted subject metadata for unstructured documents.
We found a strong preference for prototypes displaying just a few facets
generated based on either the query or the matching documents.
Our evaluation reveals that there is a preference for certain features depending on the search task. In addition, we touch on the current pain point of faceted search: the acquisition of faceted subject metadata for unstructured documents. We found a strong preference for prototypes displaying just a few facets generated based on either the query or the matching documents.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing one single patient’s information gathered from a large volume of data over a long period of time. The main objective of this paper is to represent our ontology-based information retrieval approach for clinical Information System. We have performed a Case Study in the real life hospital settings. The results obtained illustrate the feasibility of the proposed approach which significantly improved the information retrieval process on a large volume of data over a long period of time from August 2011 until January 2012.
The document discusses a study examining how doctors make judgments about online health information and how those judgments impact their search behaviors. The study analyzed diaries and interviews from 35 doctors documenting 444 search incidents. The results showed that doctors are aware of the importance of information quality and source credibility but rarely make explicit evaluations of these. Instead, doctors tend to rely on predictive judgments about information quality and credibility at familiar websites based on their past experiences. These predictive judgments introduce bias that influences doctors' cognitive search models and strategies. The study suggests doctors' mental models and experience enable these predictive judgments and that cognitive search should be studied longitudinally rather than as isolated tasks.
This document summarizes a research paper that proposes using machine learning algorithms and natural language processing techniques to extract disease and treatment information from medical texts. It discusses using a pipeline of tasks including identifying relevant sentences, representing the sentences, and classifying the relationships between diseases and treatments. The paper reviews previous work on using algorithms like naive Bayes and conditional naive Bayes for information extraction and relation classification. It proposes applying machine learning and natural language processing to biomedical texts from sources like Medline to automatically extract symptoms, causes, and treatments for diseases specified in user queries. This extracted information could help doctors and patients by providing structured medical knowledge in a time-saving manner.
TALASH: A SEMANTIC AND CONTEXT BASED OPTIMIZED HINDI SEARCH ENGINEIJCSEIT Journal
This document summarizes a research paper that proposes three models for query expansion in a Hindi search engine: 1) Using lexical resources like HindiWordNet to find synonyms and related terms, 2) Using user context information like location, interests and profession, 3) Combining lexical resources and user context. An experiment compares the precision of results from simple Google searches to searches using each model. Precision was highest using the combined Model III at 0.79, showing that integrating lexical and user context information improves search quality in Hindi.
Identifying Structures in Social Conversations in NSCLC Patients through the ...IJERA Editor
The exploration of social conversations for addressing patient’s needs is an important analytical task in which
many scholarly publications are contributing to fill the knowledge gap in this area. The main difficulty remains
the inability to turn such contributions into pragmatic processes the pharmaceutical industry can leverage in
order to generate insight from social media data, which can be considered as one of the most challenging source
of information available today due to its sheer volume and noise. This study is based on the work by Scott
Spangler and Jeffrey Kreulen and applies it to identify structure in social media through the extraction of a
topical taxonomy able to capture the latent knowledge in social conversations in health-related sites. The
mechanism for automatically identifying and generating a taxonomy from social conversations is developed and
pressured tested using public data from media sites focused on the needs of cancer patients and their families.
Moreover, a novel method for generating the category’s label and the determination of an optimal number of
categories is presented which extends Scott and Jeffrey’s research in a meaningful way. We assume the reader is
familiar with taxonomies, what they are and how they are used.
Identifying Structures in Social Conversations in NSCLC Patients through the ...IJERA Editor
The exploration of social conversations for addressing patient’s needs is an important analytical task in which
many scholarly publications are contributing to fill the knowledge gap in this area. The main difficulty remains
the inability to turn such contributions into pragmatic processes the pharmaceutical industry can leverage in
order to generate insight from social media data, which can be considered as one of the most challenging source
of information available today due to its sheer volume and noise. This study is based on the work by Scott
Spangler and Jeffrey Kreulen and applies it to identify structure in social media through the extraction of a
topical taxonomy able to capture the latent knowledge in social conversations in health-related sites. The
mechanism for automatically identifying and generating a taxonomy from social conversations is developed and
pressured tested using public data from media sites focused on the needs of cancer patients and their families.
Moreover, a novel method for generating the category’s label and the determination of an optimal number of
categories is presented which extends Scott and Jeffrey’s research in a meaningful way. We assume the reader is
familiar with taxonomies, what they are and how they are used.
Key Topics in Health Care Technology EvaluationThe amount of new i.docxsleeperfindley
Key Topics in Health Care Technology Evaluation
The amount of new information and data, and the number of available technologies are growing at an ever-accelerating rate. Did you know that during any given 24 hours, humanity generates enough new information to fill the Library of Congress 70 times (Smolan & Erwitt, 2012)? As a nurse informaticist, it is important to keep current on new developments in the field, but with the rapid pace of change, that effort can be overwhelming. It is easier to keep current with key trends if nurse informaticists focus on selected issues.
In this Discussion, you consider key topics in the field of health care technology. You then consider the different approaches you could take when designing an evaluation in these areas. For example, if you are interested in usability, your goal could be to determine if a system is user friendly from the viewpoint of a nurse. A different goal might be to determine if the location of the system facilitates ease of use from the viewpoint of physicians.
Note:
This Discussion serves as practice for the first part of your Evaluation Project. What you derive from your Discussion with colleagues will likely inform the work that you do in Part 1 of the Evaluation Project.
The Discussion focuses on the following major topics in the health care information field:
Implementing HIT Systems
Consumer health information
Computerized Physician Order Entry (CPOE)
Decision support systems
Electronic health records (EHR)
Tele-medicine and eHealth
Nursing documentation
Other Issues Related to the Use of HIT Systems
Interoperability
Unforeseen consequences
Usability
To prepare:
Select at least
two
topics from the
lists above
that are relevant to your current organization or that are of particular interest to you. Read the articles in this week’s Learning Resources that relate to these topics. Consider why these topics are of interest to you, what relevance they have to health care organizations, and how they impact your professional responsibilities. Choose one topic to be the focus of your Evaluation Project, and consider potential evaluation goals.
Determine the viewpoint from which you would approach the evaluation, and why.
By tomorrow, post a minimum of 550 words essay in APA format with a minimum of 3 references from the list of required resources below, that addresses the level one headings as numbered below:
1)
Post
the two topics you identified as most relevant to your organization or to you personally, and explain why you selected those topics.
2)
Identify the topic you selected for your Evaluation Project, and propose three potential evaluation goals for this topic.
3)
Identify the viewpoint you would use with each goal, and explain why.
Required Readings
Friedman, C. P., & Wyatt, J. C. (2010). Evaluation methods in biomedical informatics (2nd ed.). New York, NY: Springer Science+Business Media, Inc
.
Chapter 2, “Evaluation as a Field” (pp. 21–47)
This chapter defines.
Document Retrieval System, a Case StudyIJERA Editor
In this work we have proposed a method for automatic indexing and retrieval. This method will provide as a
result the most likelihood document which is related to the input query. The technique used in this project is
known as singular-value decomposition, in this method a large term by document matrix is analyzed and
decomposed into 100 factors. Documents are represented by 100 item vector of factor weights. On the other
hand queries are represented as pseudo-document vectors, which are formed from weighed combinations of
terms.
Great model a model for the automatic generation of semantic relations betwee...ijcsity
The
large
a
v
ailable
am
ou
n
t
of
non
-
structured
texts
that
b
e
-
long
to
differe
n
t
domains
su
c
h
as
healthcare
(e.g.
medical
records),
justice
(e.g.
l
a
ws,
declarations),
insurance
(e.g.
declarations),
etc. increases
the
effort
required
for
the
analysis
of
information
in
a
decision making
pro
-
cess.
Differe
n
t
pr
o
jects
and t
o
ols
h
av
e
pro
p
osed
strategies
to
reduce
this
complexi
t
y
b
y
classifying,
summarizing
or
annotating
the
texts.
P
artic
-
ularl
y
,
text
summary
strategies
h
av
e
pr
ov
en
to
b
e
v
ery
useful
to
pr
o
vide
a
compact
view
of
an
original
text.
H
ow
e
v
er,
the
a
v
ailable
strategies
to
generate
these
summaries
do
not
fit
v
ery
w
ell
within
the
domains
that
require
ta
k
e
i
n
to
consideration
the
tem
p
oral
dimension
of
the
text
(e.g.
a
rece
n
t
piece
of
text
in
a
medical
record
is
more
im
p
orta
n
t
than
a
pre
-
vious
one)
and
the
profile
of
the
p
erson
who
requires
the
summary
(e.g
the
medical
s
p
ecialization).
T
o
co
p
e with
these
limitations
this
pa
p
er
prese
n
ts
”GRe
A
T”
a
m
o
del
for
automatic
summary
generation
that
re
-
lies
on
natural
language
pr
o
cessing
and
text
mining
te
c
hniques
to
extract
the
most
rele
v
a
n
t
information
from
narrati
v
e
texts
and
disc
o
v
er
new
in
-
formation
from
the
detection
of
related
information. GRe
A
T
M
o
del
w
as impleme
n
ted
on
sof
tw
are
to
b
e
v
alidated
in
a
health
institution
where
it
has
sh
o
wn
to
b
e
v
ery
useful
to displ
a
y
a
preview
of
the
information
a
b
ou
t
medical
health
records
and
disc
o
v
er
new
facts
and
h
y
p
otheses
within
the
information.
Se
v
eral
tests
w
ere
executed
su
c
h
as
F
unctional
-
i
t
y
,
Usabili
t
y
and
P
erformance
regarding
to
the
impleme
n
ted
sof
t
w
are.
In
addition,
precision
and
recall
measures
w
ere
applied
on
the
results
ob
-
tained
through
the
impleme
n
ted
t
o
ol,
as
w
ell
as
on
the
loss
of
information
obtained
b
y
pr
o
viding
a
text
more
shorter than
the
original
Here are the key points about using content-based filtering techniques:
- Content-based filtering relies on analyzing the content or description of items to recommend items similar to what the user has liked in the past. It looks for patterns and regularities in item attributes/descriptions to distinguish highly rated items.
- The item content/descriptions are analyzed automatically by extracting information from sources like web pages, or entered manually from product databases.
- It focuses on objective attributes about items that can be extracted algorithmically, like text analysis of documents.
- However, personal preferences and what makes an item appealing are often subjective qualities not easily extracted algorithmically, like writing style or taste.
- So while content-based filtering can
Similar to DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASES (20)
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
The progressive development of Synthetic Aperture Radar (SAR) systems diversify the exploitation of the generated images by these systems in different applications of geoscience. Detection and monitoring surface deformations, procreated by various phenomena had benefited from this evolution and had been realized by interferometry (InSAR) and differential interferometry (DInSAR) techniques. Nevertheless, spatial and temporal decorrelations of the interferometric couples used, limit strongly the precision of analysis results by these techniques. In this context, we propose, in this work, a methodological approach of surface deformation detection and analysis by differential interferograms to show the limits of this technique according to noise quality and level. The detectability model is generated from the deformation signatures, by simulating a linear fault merged to the images couples of ERS1 / ERS2 sensors acquired in a region of the Algerian south.
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
A novel based a trajectory-guided, concatenating approach for synthesizing high-quality image real sample renders video is proposed . The lips reading automated is seeking for modeled the closest real image sample sequence preserve in the library under the data video to the HMM predicted trajectory. The object trajectory is modeled obtained by projecting the face patterns into an KDA feature space is estimated. The approach for speaker's face identification by using synthesise the identity surface of a subject face from a small sample of patterns which sparsely each the view sphere. An KDA algorithm use to the Lip-reading image is discrimination, after that work consisted of in the low dimensional for the fundamental lip features vector is reduced by using the 2D-DCT.The mouth of the set area dimensionality is ordered by a normally reduction base on the PCA to obtain the Eigen lips approach, their proposed approach by[33]. The subjective performance results of the cost function under the automatic lips reading modeled , which wasn’t illustrate the superior performance of the
method.
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
Universities offer software engineering capstone course to simulate a real world-working environment in which students can work in a team for a fixed period to deliver a quality product. The objective of the paper is to report on our experience in moving from Waterfall process to Agile process in conducting the software engineering capstone project. We present the capstone course designs for both Waterfall driven and Agile driven methodologies that highlight the structure, deliverables and assessment plans.To evaluate the improvement, we conducted a survey for two different sections taught by two different instructors to evaluate students’ experience in moving from traditional Waterfall model to Agile like process. Twentyeight students filled the survey. The survey consisted of eight multiple-choice questions and an open-ended question to collect feedback from students. The survey results show that students were able to attain hands one experience, which simulate a real world-working environment. The results also show that the Agile approach helped students to have overall better design and avoid mistakes they have made in the initial design completed in of the first phase of the capstone project. In addition, they were able to decide on their team capabilities, training needs and thus learn the required technologies earlier which is reflected on the final product quality
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
This document discusses using social media technologies to promote student engagement in a software project management course. It describes the course and objectives of enhancing communication. It discusses using Facebook for 4 years, then switching to WhatsApp based on student feedback, and finally introducing Slack to enable personalized team communication. Surveys found students engaged and satisfied with all three tools, though less familiar with Slack. The conclusion is that social media promotes engagement but familiarity with the tool also impacts satisfaction.
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
In real world computing environment with using a computer to answer questions has been a human dream since the beginning of the digital era, Question-answering systems are referred to as intelligent systems, that can be used to provide responses for the questions being asked by the user based on certain facts or rules stored in the knowledge base it can generate answers of questions asked in natural , and the first main idea of fuzzy logic was to working on the problem of computer understanding of natural language, so this survey paper provides an overview on what Question-Answering is and its system architecture and the possible relationship and
different with fuzzy logic, as well as the previous related research with respect to approaches that were followed. At the end, the survey provides an analytical discussion of the proposed QA models, along or combined with fuzzy logic and their main contributions and limitations.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
African Buffalo Optimization (ABO) is one of the most recent swarms intelligence based metaheuristics. ABO algorithm is inspired by the buffalo’s behavior and lifestyle. Unfortunately, the standard ABO algorithm is proposed only for continuous optimization problems. In this paper, the authors propose two discrete binary ABO algorithms to deal with binary optimization problems. In the first version (called SBABO) they use the sigmoid function and probability model to generate binary solutions. In the second version (called LBABO) they use some logical operator to operate the binary solutions. Computational results on two knapsack problems (KP and MKP) instances show the effectiveness of the proposed algorithm and their ability to achieve good and promising solutions.
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
In recent years, many malware writers have relied on Dynamic Domain Name Services (DDNS) to maintain their Command and Control (C&C) network infrastructure to ensure a persistence presence on a compromised host. Amongst the various DDNS techniques, Domain Generation Algorithm (DGA) is often perceived as the most difficult to detect using traditional methods. This paper presents an approach for detecting DGA using frequency analysis of the character distribution and the weighted scores of the domain names. The approach’s feasibility is demonstrated using a range of legitimate domains and a number of malicious algorithmicallygenerated domain names. Findings from this study show that domain names made up of English characters “a-z” achieving a weighted score of < 45 are often associated with DGA. When a weighted score of < 45 is applied to the Alexa one million list of domain names, only 15% of the domain names were treated as non-human generated.
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
The document proposes a blockchain-based digital currency and streaming platform called GoMAA to address issues of piracy in the online music streaming industry. Key points:
- GoMAA would use a digital token on the iMediaStreams blockchain to enable secure dissemination and tracking of streamed content. Content owners could control access and track consumption of released content.
- Original media files would be converted to a Secure Portable Streaming (SPS) format, embedding watermarks and smart contract data to indicate ownership and enable validation on the blockchain.
- A browser plugin would provide wallets for fans to collect GoMAA tokens as rewards for consuming content, incentivizing participation and addressing royalty discrepancies by recording
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
This document discusses the importance of verb suffix mapping in discourse translation from English to Telugu. It explains that after anaphora resolution, the verbs must be changed to agree with the gender, number, and person features of the subject or anaphoric pronoun. Verbs in Telugu inflect based on these features, while verbs in English only inflect based on number and person. Several examples are provided that demonstrate how the Telugu verb changes based on whether the subject or pronoun is masculine, feminine, neuter, singular or plural. Proper verb suffix mapping is essential for generating natural and coherent translations while preserving the context and meaning of the original discourse.
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
In this paper, based on the definition of conformable fractional derivative, the functional
variable method (FVM) is proposed to seek the exact traveling wave solutions of two higherdimensional
space-time fractional KdV-type equations in mathematical physics, namely the
(3+1)-dimensional space–time fractional Zakharov-Kuznetsov (ZK) equation and the (2+1)-
dimensional space–time fractional Generalized Zakharov-Kuznetsov-Benjamin-Bona-Mahony
(GZK-BBM) equation. Some new solutions are procured and depicted. These solutions, which
contain kink-shaped, singular kink, bell-shaped soliton, singular soliton and periodic wave
solutions, have many potential applications in mathematical physics and engineering. The
simplicity and reliability of the proposed method is verified.
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
The document discusses automated penetration testing and provides an overview. It compares manual and automated penetration testing, noting that automated testing allows for faster, more standardized and repeatable tests but has limitations in developing new exploits. It also reviews some current automated penetration testing methodologies and tools, including those using HTTP/TCP/IP attacks, linking common scanning tools, a Python-based tool targeting databases, and one using POMDPs for multi-step penetration test planning under uncertainty. The document concludes that automated testing is more efficient than manual for known vulnerabilities but cannot replace manual testing for discovering new exploits.
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing
attention of neuroscientists and computer scientists, since it opens a new window to explore
functional network of human brain with relatively high resolution. BOLD technique provides
almost accurate state of brain. Past researches prove that neuro diseases damage the brain
network interaction, protein- protein interaction and gene-gene interaction. A number of
neurological research paper also analyse the relationship among damaged part. By
computational method especially machine learning technique we can show such classifications.
In this paper we used OASIS fMRI dataset affected with Alzheimer’s disease and normal
patient’s dataset. After proper processing the fMRI data we use the processed data to form
classifier models using SVM (Support Vector Machine), KNN (K- nearest neighbour) & Naïve
Bayes. We also compare the accuracy of our proposed method with existing methods. In future,
we will other combinations of methods for better accuracy.
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
The document proposes a new validation method for fuzzy association rules based on three steps: (1) applying the EFAR-PN algorithm to extract a generic base of non-redundant fuzzy association rules using fuzzy formal concept analysis, (2) categorizing the extracted rules into groups, and (3) evaluating the relevance of the rules using structural equation modeling, specifically partial least squares. The method aims to address issues with existing fuzzy association rule extraction algorithms such as large numbers of extracted rules, redundancy, and difficulties with manual validation.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
Data collection is an essential, but manpower intensive procedure in ecological research. An
algorithm was developed by the author which incorporated two important computer vision
techniques to automate data cataloging for butterfly measurements. Optical Character
Recognition is used for character recognition and Contour Detection is used for imageprocessing.
Proper pre-processing is first done on the images to improve accuracy. Although
there are limitations to Tesseract’s detection of certain fonts, overall, it can successfully identify
words of basic fonts. Contour detection is an advanced technique that can be utilized to
measure an image. Shapes and mathematical calculations are crucial in determining the precise
location of the points on which to draw the body and forewing lines of the butterfly. Overall,
92% accuracy were achieved by the program for the set of butterflies measured.
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
Smart cities utilize Internet of Things (IoT) devices and sensors to enhance the quality of the city
services including energy, transportation, health, and much more. They generate massive
volumes of structured and unstructured data on a daily basis. Also, social networks, such as
Twitter, Facebook, and Google+, are becoming a new source of real-time information in smart
cities. Social network users are acting as social sensors. These datasets so large and complex
are difficult to manage with conventional data management tools and methods. To become
valuable, this massive amount of data, known as 'big data,' needs to be processed and
comprehended to hold the promise of supporting a broad range of urban and smart cities
functions, including among others transportation, water, and energy consumption, pollution
surveillance, and smart city governance. In this work, we investigate how social media analytics
help to analyze smart city data collected from various social media sources, such as Twitter and
Facebook, to detect various events taking place in a smart city and identify the importance of
events and concerns of citizens regarding some events. A case scenario analyses the opinions of
users concerning the traffic in three largest cities in the UAE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
The anonymity of social networks makes it attractive for hate speech to mask their criminal
activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing
volume of social media data, hate speech identification becomes a challenge in
aggravating conflict between citizens of nations. The high rate of production, has become
difficult to collect, store and analyze such big data using traditional detection methods. This
paper proposed the application of apache spark in hate speech detection to reduce the
challenges. Authors developed an apache spark based model to classify Amharic Facebook
posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes
for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation,
the model based on word2vec embedding performed best with 79.83%accuracy. The
proposed method achieve a promising result with unique feature of spark for big data.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
2. 352 Computer Science & Information Technology (CS & IT)
Meta information available for the website. This framework has been implemented by ontologies.
In fact ontologies are saved in this format and are termed as RDF (Resource Development
Framework).
The reason for ontologies being the first choice for semantic web is because they can very easily
specify the concepts and unambiguously maintain concept hierarchies. For example, if a user
gives a search query as “Manmohan Singh” or “Indian Head Dr. Singh” or “Indian Government
Leader” or “Prime Minister of India” to a search engine, then a normal search engine, in some
cases would not be able to find the correct results whereas a semantic web based search engine
(which is backed by ontologies) can very easily provide the same results to all the search queries.
Moreover, if a search query like “plane bomb” is provided to a search engine then it will not be
able to relate the two different concepts whereas a semantically informed search engine would
relate it with plane bombing and associated treats and would display the results accordingly. This
is a clear advantage of semantic search engines (or ontologies) over normal search engines.
This leads us to a question, “What is Ontology”. According to Gruber[2], ontology can be defined
as “an explicit specification of conceptualization”. We need ontology because it can easily relate
similar concepts and can find associations between similar concepts’ relations and properties, thus
reduces the burden of explicitly defining everything.
The rest of the paper is arranged as follows: In section 2 we review the previous work-done in
ontology creation and also the work done in health information processing. In section 3 we
describe the complete development process of our ontology. In section 4 we evaluate the concepts
and properties using a reasoner which implements description logic. Using this reasoner we verify
integrity of the concepts, relations and properties. Section 5 concludes the work done.
2. LITERATURE SURVEY
Many researchers have investigated search behaviors of various users for gathering information
regarding health. Bhavnani et. al.[3] showed that co-occurrence counts of medical information
(like symptoms and disorders) on a web page significantly influence the search behavior. Spink
et. al.[4] showed that when health related information is to be searched then users move from
general purpose search engines to specialized search engines. Hersh et. al.[5] reviewed medical
informatics and information science literature regarding how physicians use IR systems to gain
confidence in clinical question answering and decision making. They found that these retrieval
systems were inadequate as they retrieved very few relevant articles on a given topic. They did a
follow up review on this with another study[6] which showed how students of medicine and
nursing use MEDLINE to get information for clinical question-answering. They found out that,
with the help of literature searching users were just slightly successful at answering clinical
questions. Eastin and Guinsler[7] investigated the relationship between online health information
seeking for a symptom or disease and visiting a general practitioner for inquiring the same. They
found out that a user’s level of health anxiety moderates the relationship between online health
information seeking and health care utilization decisions.
Development of Ontologies has been investigated from different point of views. Hearst[8] started
working in this area with concept annotations. Even today, his seminal work on lexico-syntactic
patterns is very relevant for annotation based knowledge applications. This work has been refined
and reused by several researchers who have applied different approaches. For example, Poesio et.
al.[9] extended Hearst patterns for anaphora resolution and used machine learning approaches in
identifying patterns. Vashisth et al [10] developed an ontology on human anatomy. This was a
heavy weight ontology in which the ontology was divided into four sub-parts. They were:
3. Computer Science & Information Technology (CS & IT) 353
skeleton structure, nervous system, digestive system, cardiovascular system. In all they created
around 600 concepts which have several sub-concepts, relations and properties. Mathur et al [11]
developed an ontology on health care services in India. They developed this ontology by
collecting health related news articles. Eezioni[12] and Market[13] showed the use of ontologies
in Internet by using search engine APIs. Some of the researchers have also applied Lexico-
syntactic patterns in the identification of other lexical relations like Carniak and Berland[14]
using them in representing part-of relations and Girju and Moldovan[15] representing them for
causal relations. Cederberg and Widdows[16] showed how one can improve pattern filtering of
Hearst patterns by using Latent Semantic Analysis. Morin and Jacquemin[17] and Ravichandran
and Hovy[18] worked on addressing automatic generation of patterns via similarity based
approaches in which vectors were formed using same patterns. This approach proved to be more
generalized then what was proposed by Hearst.
In literature, we can also find systems which have been developed using these techniques like
Buitelaar et. al.[19] showed the use of OntoLT, an ontology learning plugin for Protégé Ontology
editor. This system annotated parts of speech chunks, and grammatical relations using a parser.
Velardi el. al.[20] showed the use of OntoLearn system where terms were extracted for a domain
from a domain-specific textual corpus. This tool became one of the most important tools in
automatic creation of ontologies through text. Niang et. al.[21] has shown the requirement of
customized development for domain specific ontologies. He argued the ontology development
process can be different in different domains and cannot be engineered using the same techniques.
Samsfard et. al.[22] showed the development of Persian Word Net using semi automatic
processes.
3. PROPOSED WORK
A lot of information is available for health related topics. Most of the websites providing same
information for a particular disease or medical procedure use different terminologies. This
appears to be a major bottleneck in development of standards and guidelines for annotation of
health data. This has inspired us in the development of ontology in health domain which improves
the reuse of timely information. Moreover this can map similar concepts and relations available
from different sources. For example, a website uses a term HIV+ whereas another website uses
AIDS for same information. Although they both are same, search engines would consider them as
different search terms which in-turn leads to information related to these terms to be treated as
different. Using ontologies we can represent these terms as equivalent terms, thus all the
information of one concept can be used by another concept.
3.1 Methodology
Efforts have been put in to develop ontologies as they can store data semantically, which helps in
the development of applications for semantic web and the areas where semantic knowledge holds
the key value. A key reason of developing ontologies is their ability to learn from real world and
to identify the instances and create relations among them. Using ontologies we can generalize
health domain terminology and its concepts. As most diseases have same symptoms, it becomes a
problem for a normal application to relate which symptom is of which disease, as it creates an
ambiguity. Ontological taxonomies are quite useful in disambiguating them. Ontology is a useful
tool for developing standards and guidelines for interoperability between various health care
information services and heterogeneous web resources. Ontology development is necessarily an
iterative process. Each Concept in the Ontology should emulate a real life entity and its
relationships in domain of interest. In the following sections we show the development of our
ontology.
4. 354 Computer Science & Information Technology (CS & IT)
3.2 Specification
In this phase we collected information regarding various concepts from books, journals and
websites. We did a comprehensive study of all the finer points. When we had certain doubts, we
contacted experts like doctors, paramedics and medical representatives. They provided us with an
insight which helped us clear our confusions. When all the information related to diseases, their
possible symptoms and possible cure was collected; then we ascertained the domain and scope of
our ontology.
3.3 Conceptualization
We started building our ontology by first implementing the concepts (classes) that were part of
our ontology. We created the Human Communicable Disease (HCD) as the super class and all the
sub-systems (diseases, symptoms, causes) as its sub classes. While creating different classes, we
came across some similar properties and relations which were present in these classes. So, we
marked these classes as equivalence classes. Moreover, we also found that some classes did not
have common properties. So, these classes were marked as disjoint classes. Figure 1, illustrates
these concepts.
After the implementation of classes, we moved onto defining the internal structures of the classes
through defining relations, object properties and the relation between two relations. Binary
relationship of individuals was implemented through object property. Figure 2 shows the
implementation of this in our ontology. The relations used in our ontology were as follows:
• Antonyms (communicable, Non-communicable)
• Synonyms (flu, Influenza)
• Synonyms (TB, Tuberculosis)
• Is-a (Disease, Health Domain)
• Is-a (Chickenpox, communicable)
• Is-a (Flu, communicable)
• Is-a (Measles, communicable)
5. Computer Science & Information Technology (CS & IT) 355
Figure 1. Structure of Health Ontology
Figure 2. Object Property of relations
3.5 Creation of Instances
We created the individual instances of the concepts which differentiated between various
concepts according to their components. Individual properties and their relations were used for
this purpose. Figure 3 shows this instance diagrams.
6. 356 Computer Science & Information Technology (CS & IT)
Figure 3. Creation of Instances
3.6 Ontology Visualization
In order to view the relationship between various concepts, relations and their properties we
created a graphical view of our ontology. We verified the equivalence relationships between
concepts. Figure 4 shows the detailed description of our ontology. Here, the ‘+’ symbol shows
that there are concepts and relations which can be expanded. A circle marked rectangles shows
the concepts and the diamond marked rectangles show the instance properties. We have marked
prevention and treatment as equivalent concepts (classes) in our ontology. This can be seen in the
figure as it shows the concepts has equivalent symbol (≡) marked in the circles. Moreover, as
seen in the figure, Sneeze is a common symptom for measles and chickenpox. Thus it has an edge
connecting both the measles and chickenpox.
4. EVALUATION
Since we have developed a resource which will be used by other applications, it was very much
necessary to evaluate its correct retrieval. We were also keen on verifying as to whether the
ontology has linked the relations and properties correctly or not and whether the relation and
properties of equivalent classes are retrieved or not. So, to verify our ontology we used a reasoner
based on DL Query which checked each relation, property and in turn the class. We configured
the FaCT++[23] reasoner in Protégé for using description logic based queries. When we provided
the class name to the reasoner, it retrieved the query in terms of classes, individuals, super class,
domain and range. Through the results of the reasoner, we were convinced that ontology is
created without any defects as the system was able to retrieve desired classes, super classes and
its associated relations and properties. Figure 5 shows the case when treatment was given as an
input and it fetched equivalent class, sub-class, super-class etc. In another case we gave causes
7. Computer Science & Information Technology (CS & IT) 357
and executed the query. This fetched all the ancestor super-classes and all the instances of the
concept. This is shown in figure 6. Similarly each concept, relation and instance was verified for
correct results.
Figure 4. Visualization of Health Ontology
Figure 5. DL Query for Treatment
8. 358 Computer Science & Information Technology (CS & IT)
Figure 6. DL Query for Causes
5. CONCLUSION
In this paper we have demonstrated the development process of ontology in health domain. We
have discussed various phases in the development of ontology. We have also evaluated our
ontology using a reasoner which verified the ontology by executing each class and their
properties. As an extension of this work, we would like to enhance the ontology by adding more
diseases and its cures. We also wish to connect different components of the ontology with
respective web contents and we would develop a front end for this ontology which shall retrieve
the results of any disease, its symptoms and would also provide the related information available
on the web.
REFERENCES
[1] Brickley, D., Guha, R.V., (1999) “Resource Description Framework (RDF) Schema Specification”.
http://www.w3.org/TR/PR-rdf-schema.
[2] Gruber, T.R., (1993) “A translation approach to portable ontology specification”. Knowledge
Acquisition. 5, pp 199–220.
[3] Bhavnani, S.K., Jacob, R.T., Nardine, J., Peck, F.A., (2003) “Exploring the distribution of online
healthcare information”. ACM SIG Computer Human Interaction Annual Conference. pp 816–817.
[4] Spink, A., Yang, Y., Jansen, J. , Nykanen, P., Lorence, D.P., Ozmutlu, S., Ozmutlu, H.C., (2004) “A
study of medical and health queries to web search engines”. Health Information and Libraries Journal,
21, pp 44–51.
9. Computer Science & Information Technology (CS & IT) 359
[5] Hersh, W.R., Hicka, D.H., (1998) “How well do physicians use electronic information retrieval
systems? A frame-work for investigation and systematic review”. Journal of American Medical
Association.
[6] Hersh, W.R., Crabtree, M.R., Hickam, D.H., Sacherek, L., Friedman, C.P., Tidmarsh, P., Mosbaek,
C., Kraemer, D., (2002) “Factors associated with success in searching MEDLINE and applying
evidence to answer clinical questions”. Journal of American Medical Information Association, 9, pp
283–293.
[7] Eastin, M.S., Guinsler, N.M., (2006) “Worried and wired: effects of health anxiety on information-
seeking and health care utilization behaviors”. Cybernetics and Behavior, 9(4), pp 494–498.
[8] Hearst, M., (1992) “Automatic acquisition of hyponyms from large text corpora”. In: 14th
International Conference on Computational Linguistics.
[9] Poesio, M., Almuhareb, M., (2005) “Identifying concept attributes using a classifier”. In: ACL
Workshop on Deep Lexical Acquisition.
[10] Vashisth, A., Mathur, I., & Joshi, N., (2012) “OntoAna: Domain Ontology for Human
Anatomy”. arXiv preprint arXiv:1208.3802.
[11] Mathur, I., Mathur, S., & Joshi, N., (2011) “Ontology development for health care in India”.
In Proceedings of the International Conference & Workshop on Emerging Trends in Technology (pp.
715-718). ACM.
[12] Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.,
Yates, A., (2005) “Web scale information extraction in Know It All”. In: 13th World Wide Web
Conference.
[13] Markert, K., Modjeska, N., Nissim, M., (2003) “Using the web for normal anaphora resolution”. In:
EACL Workshop on the Computational Treatment of Anaphora.
[14] Carniak, E., Berland, M., (1999) “Finding parts in a very large corpora”. In: 37th Annual Meeting of
the Association of Computational Linguistics.
[15] Girju, R., Moldovan, M., (2002) “Text mining for causal relations”. In: FLAIRS Conference.
[16] Cederberg S., Widdows, D., (2003) “Using LSA and noun coordination information to improve the
precision and recall of automatic hyponymy extraction”. In: Conference on Natural Language
Learning.
[17] Morin, E., Jacquemin, C., (1999) “Projecting corpus based semantic links on a thesaurus”. In: 37th
Annual Meeting of the Association of Computational Linguistics.
[18] Ravichandran, D., Hovy. E., (2002) “Learning surface patterns for a question answering system”. In:
40th Annual Meeting of the Association of Computational Linguistics.
[19] Buitelaar, P., Olejnik, D., Sintek, M., (2004) “A Protégé plug-in for ontology extraction from text
based linguistic analysis”. In: 1st European Semantic Web Symposium .
[20] Velardi, P., Navigli, R., Cuchuarelli A., Neri, F., (2005) “Evaluation of OntoLearn, a methodology for
automatic population of domain ontologies”
[21] Niang, C., Béatrice, B., Moussa, L., (2010) “Towards tailored domain ontologies”. In: 5th
International Workshop on Ontology Matching .
[22] Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., Fekri, E.,
Monshizadeh, M., Assi, M., (2010) “Semi Automatic Development of FarsiNet: The Persian
WorldNet”. In: 5th International Conference on Global WorldNet (2010).
[23] Tsarkov, D., Horrocks, I., (2006) “Fact++ Description Logic Reasoner: System Discription”. In:
International Joint Conference on Automated Reasoning. Lecture Notes on Artificial Intelligence.
4130, pp 292-297. Springer.
10. 360 Computer Science & Information Technology (CS & IT)
AUTHORS
Mrs. Iti Mathur is an Assistant Professor at Banasthali University. Her primary
area of research is Computational Semantics and Ontological Engineering.
Besides this she is also involved in the development of MT engines for English
to Indian Languages. She is one of the experts empanelled with TDIL
Programme, Department of Electronics and Information Technology (DeitY),
Govt. of India, a premier organization which foresees Language Technology
Funding and Research in India. She has several publications in various journals
and conferences and also serves on the Programme Committees and Editorial
Boards of several conferences and journals.
Dr. Hemant Darbari, Executive Director of the Centre for Development of
Advanced Computing (CDAC) specialises in Artificial Intelligence System. He
has opened new avenues through his extensive research on major R&D
projects in Natural Language Processing (NLP), Machine assisted Translation
(MT), Information Extraction and Information Retrieval (IE/IR), Speech
Technology, Mobile computing, Decision Support System and Simulations. He
is a recipient of the prestigious "Computerworld Smithsonian Award Medal"
from the Smithsonian Institution, USA for his outstanding work on MANTRA-
Machine Assisted Translation Tool which is also a part of "The 1999
Innovation Collection" at National Museum of American History, Washington
DC, USA.
Mr. Nisheeth Joshi is a researcher working in the area of Machine Translation.
He has been primarily working in design and development of evaluation
metrics in Indian Languages. Besides this he is also very actively involved in
the development of MT engines for English to Indian Languages. He is one of
the experts empanelled with TDIL Programme, Department of Information
Technology, Govt. of India, a premier organization which foresees Language
Technology Funding and Research in India. He has several publications in
various journals and conferences and also serves on the Programme
Committees and Editorial Boards of several conferences and journals