This document discusses using hashing indexing to improve case retrieval in case-based reasoning systems. It proposes integrating hashing indexing into the case retrieval process to allow for faster retrieval of cases from large case bases. The document describes how hashing indexing works and how it could be applied to case retrieval by mapping case features to hash table indexes. An experiment is described that applies hashing indexing to a dataset of daily dam operation data to compare retrieval performance against sequential indexing. The results indicate that hashing indexing retrieves cases more accurately and faster than the sequential approach.
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
This paper proposes a multi-document summarization system that uses bisect k-means clustering, an optimal merge function, and a neural network. The system first preprocesses input documents through stemming and removing stop words. It then applies bisect k-means clustering to group similar sentences. The clusters are merged using an optimal merge function to find important keywords. The NEWSUM algorithm is used to generate a primary summary for each keyword. A neural network trained on sentence classifications is then used to classify sentences in the primary summary as positive or negative. Only positively classified sentences are included in the final summary to improve accuracy. The system aims to generate a concise and accurate summary in a short period of time from multiple documents on a given topic.
ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...ijaia
Regression models and their statistical analyses is one of the most important tool used by scientists and practitioners. The aim of a regression model is to fit parametric functions to data. It is known that the true regression is unknown and specific methods are created and used strictly pertaining to the roblem. For the pioneering work to develop procedures for fitting functions, we refer to the work on the methods of least
absolute deviations, least squares deviations and minimax absolute deviations. Today’s widely celebrated
procedure of the method of least squares for function fitting is credited to the published works of Legendre and Gauss. However, the least squares based models in practice may fail to provide optimal results in nonGaussian situations especially when the errors follow distributions with the fat tails. In this paper an unorthodox method of estimating linear regression coefficients by minimising GMSE(geometric mean of squared errors) is explored. Though GMSE(geometric mean of squared errors) is used to compare models it is rarely used to obtain the coefficients. Such a method is tedious to handle due to the large number of roots obtained by minimisation of the loss function. This paper offers a way to tackle that problem.
Application is illustrated with the ‘Advertising’ dataset from ISLR and the obtained results are compared
with the results of the method of least squares for single index linear regression model.
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET Journal
This document discusses using a K-means clustering algorithm to extract concepts from ambiguous text documents. It involves preprocessing the text by tokenizing, removing stop words, and stemming words. The words are then represented as vectors and dimensionality reduction using PCA is applied. Finally, K-means clustering is used to group similar words into clusters to identify the overall concepts in the document without reading the entire text. The aim is to help users understand the key topics in a document in a time-efficient manner without having to read the full text.
An Improved Similarity Matching based Clustering Framework for Short and Sent...IJECEIAES
Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
This document proposes a new approach for processing multiple queries for a patented medical database that handles temporal domain events more efficiently. The approach uses three techniques: automatic error correction, topic relevant query suggestion, and extended query augmentation. Queries are first used to retrieve coherent clusters from topic-classified medical data. Relevant results from each cluster are then combined to generate the top K answers for the user. The techniques aim to better define the user's information need and improve retrieval speed and memory usage when searching large medical databases. An experimental evaluation found the approach improved retrieval quality and efficiency compared to existing methods.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
Enhancement techniques for data warehouse staging areaIJDKP
This document discusses techniques for enhancing the performance of data warehouse staging areas. It proposes two algorithms: 1) A semantics-based extraction algorithm that reduces extraction time by pruning useless data using semantic information. 2) A semantics-based transformation algorithm that similarly aims to reduce transformation time. It also explores three scheduling techniques (FIFO, minimum cost, round robin) for loading data into the data warehouse and experimentally evaluates their performance. The goal is to enhance each stage of the ETL process to maximize overall performance.
Research Paper Selection Based On an Ontology and Text Mining Technique Using...IOSR Journals
This document proposes an ontology and text mining technique to select research papers. It involves 3 phases: 1) constructing a research ontology using keywords and frequencies from past papers, 2) classifying new papers based on ontology keywords, and 3) clustering papers in each domain using text mining and the K-means algorithm. The technique aims to better group papers and assign them to relevant reviewers by addressing limitations of keyword-based methods. It constructs a research ontology, classifies papers, clusters them based on textual similarities, and systematically assigns papers to reviewers.
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
This paper proposes a multi-document summarization system that uses bisect k-means clustering, an optimal merge function, and a neural network. The system first preprocesses input documents through stemming and removing stop words. It then applies bisect k-means clustering to group similar sentences. The clusters are merged using an optimal merge function to find important keywords. The NEWSUM algorithm is used to generate a primary summary for each keyword. A neural network trained on sentence classifications is then used to classify sentences in the primary summary as positive or negative. Only positively classified sentences are included in the final summary to improve accuracy. The system aims to generate a concise and accurate summary in a short period of time from multiple documents on a given topic.
ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...ijaia
Regression models and their statistical analyses is one of the most important tool used by scientists and practitioners. The aim of a regression model is to fit parametric functions to data. It is known that the true regression is unknown and specific methods are created and used strictly pertaining to the roblem. For the pioneering work to develop procedures for fitting functions, we refer to the work on the methods of least
absolute deviations, least squares deviations and minimax absolute deviations. Today’s widely celebrated
procedure of the method of least squares for function fitting is credited to the published works of Legendre and Gauss. However, the least squares based models in practice may fail to provide optimal results in nonGaussian situations especially when the errors follow distributions with the fat tails. In this paper an unorthodox method of estimating linear regression coefficients by minimising GMSE(geometric mean of squared errors) is explored. Though GMSE(geometric mean of squared errors) is used to compare models it is rarely used to obtain the coefficients. Such a method is tedious to handle due to the large number of roots obtained by minimisation of the loss function. This paper offers a way to tackle that problem.
Application is illustrated with the ‘Advertising’ dataset from ISLR and the obtained results are compared
with the results of the method of least squares for single index linear regression model.
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET Journal
This document discusses using a K-means clustering algorithm to extract concepts from ambiguous text documents. It involves preprocessing the text by tokenizing, removing stop words, and stemming words. The words are then represented as vectors and dimensionality reduction using PCA is applied. Finally, K-means clustering is used to group similar words into clusters to identify the overall concepts in the document without reading the entire text. The aim is to help users understand the key topics in a document in a time-efficient manner without having to read the full text.
An Improved Similarity Matching based Clustering Framework for Short and Sent...IJECEIAES
Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
This document proposes a new approach for processing multiple queries for a patented medical database that handles temporal domain events more efficiently. The approach uses three techniques: automatic error correction, topic relevant query suggestion, and extended query augmentation. Queries are first used to retrieve coherent clusters from topic-classified medical data. Relevant results from each cluster are then combined to generate the top K answers for the user. The techniques aim to better define the user's information need and improve retrieval speed and memory usage when searching large medical databases. An experimental evaluation found the approach improved retrieval quality and efficiency compared to existing methods.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
Enhancement techniques for data warehouse staging areaIJDKP
This document discusses techniques for enhancing the performance of data warehouse staging areas. It proposes two algorithms: 1) A semantics-based extraction algorithm that reduces extraction time by pruning useless data using semantic information. 2) A semantics-based transformation algorithm that similarly aims to reduce transformation time. It also explores three scheduling techniques (FIFO, minimum cost, round robin) for loading data into the data warehouse and experimentally evaluates their performance. The goal is to enhance each stage of the ETL process to maximize overall performance.
Research Paper Selection Based On an Ontology and Text Mining Technique Using...IOSR Journals
This document proposes an ontology and text mining technique to select research papers. It involves 3 phases: 1) constructing a research ontology using keywords and frequencies from past papers, 2) classifying new papers based on ontology keywords, and 3) clustering papers in each domain using text mining and the K-means algorithm. The technique aims to better group papers and assign them to relevant reviewers by addressing limitations of keyword-based methods. It constructs a research ontology, classifies papers, clusters them based on textual similarities, and systematically assigns papers to reviewers.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
Metadata represents the information about data to be stored in Data Warehouses. It is a mandatory
element of Data Warehouse to build an efficient Data Warehouse. Metadata helps in data integration,
lineage, data quality and populating transformed data into data warehouse. Spatial data warehouses are
based on spatial data mostly collected from Geographical Information Systems (GIS) and the transactional
systems that are specific to an application or enterprise. Metadata design and deployment is the most
critical phase in building of data warehouse where it is mandatory to bring the spatial information and
data modeling together. In this paper, we present a holistic metadata framework that drives metadata
creation for spatial data warehouse. Theoretically, the proposed metadata framework improves the
efficiency of accessing of data in response to frequent queries on SDWs. In other words, the proposed
framework decreases the response time of the query and accurate information is fetched from Data
Warehouse including the spatial information
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
IRJET- Text Document Clustering using K-Means Algorithm IRJET Journal
This document discusses using the K-Means clustering algorithm to cluster text documents and compares it to using K-Means clustering with dimension reduction techniques. It uses the BBC Sports dataset containing 737 documents in 5 classes. The document outlines preprocessing the text, creating a document term matrix, applying K-Means clustering, and using dimension reduction techniques like InfoGain before clustering. It evaluates the different methods using precision, recall, accuracy, and F-measure, finding that K-Means with InfoGain dimension reduction outperforms standard K-Means clustering.
An effective pre processing algorithm for information retrieval systemsijdms
The Internet is probably the most successful distributed computing system ever. However, our capabilities
for data querying and manipulation on the internet are primordial at best. The user expectations are
enhancing over the period of time along with increased amount of operational data past few decades. The
data-user expects more deep, exact, and detailed results. Result retrieval for the user query is always
relative o the pattern of data storage and index. In Information retrieval systems, tokenization is an
integrals part whose prime objective is to identifying the token and their count. In this paper, we have
proposed an effective tokenization approach which is based on training vector and result shows that
efficiency/ effectiveness of proposed algorithm. Tokenization on documents helps to satisfy user’s
information need more precisely and reduced search sharply, is believed to be a part of information
retrieval. Pre-processing of input document is an integral part of Tokenization, which involves preprocessing
of documents and generates its respective tokens which is the basis of these tokens probabilistic
IR generate its scoring and gives reduced search space. The comparative analysis is based on the two
parameters; Number of Token generated, Pre-processing time.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
This document summarizes an approach to improve source code retrieval using structural information from source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. A similarity measure is proposed that calculates the ratio of fully matching statements to partially matching statements in a sequence. Experiments show the retrieval model using this measure improves retrieval performance over other models by up to 90.9% relative to the number of retrieved methods.
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...IOSR Journals
This document summarizes and compares three clustering algorithms: DBSCAN, k-means, and SOM (Self-Organizing Maps). It first provides background on spatial data mining and clustering techniques. It then describes the DBSCAN, k-means, and SOM algorithms in detail, explaining their key steps and properties. The document evaluates the performance of these three algorithms on two-dimensional spatial datasets, analyzing the density-based clustering characteristics of each. It finds that DBSCAN can identify clusters of varying shapes and sizes with noise, while k-means partitions data into a predefined number of clusters and SOM uses neural networks to cluster unlabeled data.
Ontology Based PMSE with Manifold PreferenceIJCERT
International journal from http://www.ijcert.org
IJCERT Standard on-line Journal
ISSN(Online):2349-7084,(An ISO 9001:2008 Certified Journal)
iso nicir csir
IJCERT (ISSN 2349–7084 (Online)) is approved by National Science Library (NSL), National Institute of Science Communication And Information Resources (NISCAIR), Council of Scientific and Industrial Research, New Delhi, India.
This document summarizes an article that proposes using genetic algorithms to improve the effectiveness of a text to matrix generator. It begins with an overview of information retrieval and discusses the vector space model and genetic algorithms. It then proposes a genetic approach to optimize the objective function of a text to matrix generator to increase the average number of terms. The goal is to retrieve more relevant documents by obtaining the best combination of terms from document collections using genetic algorithms. Experimental results are presented to validate that the genetic approach improves the performance of the text to matrix generator.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGijnlc
In this paper, we propose a novel algorithm that rearrange the topic assignment results obtained from topic
modeling algorithms, including NMF and LDA. The effectiveness of the algorithm is measured by how much
the results conform to expert opinion, which is a data structure called TDAG that we defined to represent the
probability that a pair of highly correlated words appear together. In order to make sure that the internal
structure does not get changed too much from the rearrangement, coherence, which is a well known metric
for measuring the effectiveness of topic modeling, is used to control the balance of the internal structure.
We developed two ways to systematically obtain the expert opinion from data, depending on whether the
data has relevant expert writing or not. The final algorithm which takes into account both coherence and
expert opinion is presented. Finally we compare amount of adjustments needed to be done for each topic
modeling method, NMF and LDA.
Improved method for pattern discovery in text miningeSAT Journals
This document summarizes an improved method for pattern discovery in text mining proposed by Bharate Laxman and D. Sujatha. The method implements a novel pattern discovery technique proposed by Zhong et al. that discovers patterns from text documents and computes pattern specificities to evaluate term weights. The authors built a prototype application to test the technique. Experimental results showed the solution is useful for text mining as it avoids problems of misinterpretation and low frequency compared to previous methods.
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYcscpconf
A digital library is a type of information retrieval (IR) system. The existing information retrieval
methodologies generally have problems on keyword-searching. We proposed a model to solve
the problem by using concept-based approach (ontology) and metadata case base. This model
consists of identifying domain concepts in user’s query and applying expansion to them. The
system aims at contributing to an improved relevance of results retrieved from digital libraries
by proposing a conceptual query expansion for intelligent concept-based retrieval. We need to
import the concept of ontology, making use of its advantage of abundant semantics and
standard concept. Domain specific ontology can be used to improve information retrieval from
traditional level based on keyword to the lay based on knowledge (or concept) and change the
process of retrieval from traditional keyword matching to semantics matching. One approach is
query expansion techniques using domain ontology and the other would be introducing a case
based similarity measure for metadata information retrieval using Case Based Reasoning
(CBR) approach. Results show improvements over classic method, query expansion using
general purpose ontology and a number of other approaches.
International Journal of Engineering and Science Invention (IJESI) inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Semantics-based clustering approach for similar research area detectionTELKOMNIKA JOURNAL
The manual process of searching out individuals in an already existing
research field is cumbersome and time-consuming. Prominent and rookie
researchers alike are predisposed to seek existing research publications in
a research field of interest before coming up with a thesis. From
extant literature, automated similar research area detection systems have
been developed to solve this problem. However, most of them use
keyword-matching techniques, which do not sufficiently capture the implicit
semantics of keywords thereby leaving out some research articles. In this
study, we propose the use of ontology-based pre-processing, Latent Semantic
Indexing and K-Means Clustering to develop a prototype similar research area
detection system, that can be used to determine similar research domain
publications. Our proposed system solves the challenge of high dimensionality
and data sparsity faced by the traditional document clustering technique. Our
system is evaluated with randomly selected publications from faculties
in Nigerian universities and results show that the integration of ontologies
in preprocessing provides more accurate clustering results.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
Metadata represents the information about data to be stored in Data Warehouses. It is a mandatory
element of Data Warehouse to build an efficient Data Warehouse. Metadata helps in data integration,
lineage, data quality and populating transformed data into data warehouse. Spatial data warehouses are
based on spatial data mostly collected from Geographical Information Systems (GIS) and the transactional
systems that are specific to an application or enterprise. Metadata design and deployment is the most
critical phase in building of data warehouse where it is mandatory to bring the spatial information and
data modeling together. In this paper, we present a holistic metadata framework that drives metadata
creation for spatial data warehouse. Theoretically, the proposed metadata framework improves the
efficiency of accessing of data in response to frequent queries on SDWs. In other words, the proposed
framework decreases the response time of the query and accurate information is fetched from Data
Warehouse including the spatial information
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
IRJET- Text Document Clustering using K-Means Algorithm IRJET Journal
This document discusses using the K-Means clustering algorithm to cluster text documents and compares it to using K-Means clustering with dimension reduction techniques. It uses the BBC Sports dataset containing 737 documents in 5 classes. The document outlines preprocessing the text, creating a document term matrix, applying K-Means clustering, and using dimension reduction techniques like InfoGain before clustering. It evaluates the different methods using precision, recall, accuracy, and F-measure, finding that K-Means with InfoGain dimension reduction outperforms standard K-Means clustering.
An effective pre processing algorithm for information retrieval systemsijdms
The Internet is probably the most successful distributed computing system ever. However, our capabilities
for data querying and manipulation on the internet are primordial at best. The user expectations are
enhancing over the period of time along with increased amount of operational data past few decades. The
data-user expects more deep, exact, and detailed results. Result retrieval for the user query is always
relative o the pattern of data storage and index. In Information retrieval systems, tokenization is an
integrals part whose prime objective is to identifying the token and their count. In this paper, we have
proposed an effective tokenization approach which is based on training vector and result shows that
efficiency/ effectiveness of proposed algorithm. Tokenization on documents helps to satisfy user’s
information need more precisely and reduced search sharply, is believed to be a part of information
retrieval. Pre-processing of input document is an integral part of Tokenization, which involves preprocessing
of documents and generates its respective tokens which is the basis of these tokens probabilistic
IR generate its scoring and gives reduced search space. The comparative analysis is based on the two
parameters; Number of Token generated, Pre-processing time.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
This document summarizes an approach to improve source code retrieval using structural information from source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. A similarity measure is proposed that calculates the ratio of fully matching statements to partially matching statements in a sequence. Experiments show the retrieval model using this measure improves retrieval performance over other models by up to 90.9% relative to the number of retrieved methods.
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...IOSR Journals
This document summarizes and compares three clustering algorithms: DBSCAN, k-means, and SOM (Self-Organizing Maps). It first provides background on spatial data mining and clustering techniques. It then describes the DBSCAN, k-means, and SOM algorithms in detail, explaining their key steps and properties. The document evaluates the performance of these three algorithms on two-dimensional spatial datasets, analyzing the density-based clustering characteristics of each. It finds that DBSCAN can identify clusters of varying shapes and sizes with noise, while k-means partitions data into a predefined number of clusters and SOM uses neural networks to cluster unlabeled data.
Ontology Based PMSE with Manifold PreferenceIJCERT
International journal from http://www.ijcert.org
IJCERT Standard on-line Journal
ISSN(Online):2349-7084,(An ISO 9001:2008 Certified Journal)
iso nicir csir
IJCERT (ISSN 2349–7084 (Online)) is approved by National Science Library (NSL), National Institute of Science Communication And Information Resources (NISCAIR), Council of Scientific and Industrial Research, New Delhi, India.
This document summarizes an article that proposes using genetic algorithms to improve the effectiveness of a text to matrix generator. It begins with an overview of information retrieval and discusses the vector space model and genetic algorithms. It then proposes a genetic approach to optimize the objective function of a text to matrix generator to increase the average number of terms. The goal is to retrieve more relevant documents by obtaining the best combination of terms from document collections using genetic algorithms. Experimental results are presented to validate that the genetic approach improves the performance of the text to matrix generator.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGijnlc
In this paper, we propose a novel algorithm that rearrange the topic assignment results obtained from topic
modeling algorithms, including NMF and LDA. The effectiveness of the algorithm is measured by how much
the results conform to expert opinion, which is a data structure called TDAG that we defined to represent the
probability that a pair of highly correlated words appear together. In order to make sure that the internal
structure does not get changed too much from the rearrangement, coherence, which is a well known metric
for measuring the effectiveness of topic modeling, is used to control the balance of the internal structure.
We developed two ways to systematically obtain the expert opinion from data, depending on whether the
data has relevant expert writing or not. The final algorithm which takes into account both coherence and
expert opinion is presented. Finally we compare amount of adjustments needed to be done for each topic
modeling method, NMF and LDA.
Improved method for pattern discovery in text miningeSAT Journals
This document summarizes an improved method for pattern discovery in text mining proposed by Bharate Laxman and D. Sujatha. The method implements a novel pattern discovery technique proposed by Zhong et al. that discovers patterns from text documents and computes pattern specificities to evaluate term weights. The authors built a prototype application to test the technique. Experimental results showed the solution is useful for text mining as it avoids problems of misinterpretation and low frequency compared to previous methods.
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYcscpconf
A digital library is a type of information retrieval (IR) system. The existing information retrieval
methodologies generally have problems on keyword-searching. We proposed a model to solve
the problem by using concept-based approach (ontology) and metadata case base. This model
consists of identifying domain concepts in user’s query and applying expansion to them. The
system aims at contributing to an improved relevance of results retrieved from digital libraries
by proposing a conceptual query expansion for intelligent concept-based retrieval. We need to
import the concept of ontology, making use of its advantage of abundant semantics and
standard concept. Domain specific ontology can be used to improve information retrieval from
traditional level based on keyword to the lay based on knowledge (or concept) and change the
process of retrieval from traditional keyword matching to semantics matching. One approach is
query expansion techniques using domain ontology and the other would be introducing a case
based similarity measure for metadata information retrieval using Case Based Reasoning
(CBR) approach. Results show improvements over classic method, query expansion using
general purpose ontology and a number of other approaches.
International Journal of Engineering and Science Invention (IJESI) inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Semantics-based clustering approach for similar research area detectionTELKOMNIKA JOURNAL
The manual process of searching out individuals in an already existing
research field is cumbersome and time-consuming. Prominent and rookie
researchers alike are predisposed to seek existing research publications in
a research field of interest before coming up with a thesis. From
extant literature, automated similar research area detection systems have
been developed to solve this problem. However, most of them use
keyword-matching techniques, which do not sufficiently capture the implicit
semantics of keywords thereby leaving out some research articles. In this
study, we propose the use of ontology-based pre-processing, Latent Semantic
Indexing and K-Means Clustering to develop a prototype similar research area
detection system, that can be used to determine similar research domain
publications. Our proposed system solves the challenge of high dimensionality
and data sparsity faced by the traditional document clustering technique. Our
system is evaluated with randomly selected publications from faculties
in Nigerian universities and results show that the integration of ontologies
in preprocessing provides more accurate clustering results.
A Novel Data mining Technique to Discover Patterns from Huge Text CorpusIJMER
Today, we have far more information than we can handle: from business transactions and scientific
data, to satellite pictures, text reports and military intelligence. Information retrieval is simply not enough
anymore for decision-making. Confronted with huge collections of data, we have now created new needs to
help us make better managerial choices. These needs are automatic summarization of data, extraction of the
"essence" of information stored, and the discovery of patterns in raw data. With this, Data mining with
inventory pattern came into existence and got popularized. Data mining finds these patterns and relationships
using data analysis tools and techniques to build models.
DEVELOPING A CASE-BASED RETRIEVAL SYSTEM FOR SUPPORTING IT CUSTOMERSIJCSEA Journal
This document describes the development of a case-based retrieval system to help IT customers solve problems. It discusses using a case-based reasoning approach where prior solutions and experiences are stored as cases. A conversational case-based reasoning system is developed that allows users to describe problems and receive potential solutions through a dialogue. The system was tested on sample problems and achieved a high success rate of 90% with an average of 7.7 steps to retrieve solutions.
DEVELOPING A CASE-BASED RETRIEVAL SYSTEM FOR SUPPORTING IT CUSTOMERSIJCSEA Journal
The case-based reasoning (CBR) approach is a modern approach. It is adopted for designing knowledgebased expertise systems. The aforementioned approach depends much on the stored experiences. These experiences serve as cases that can be employed for solving new problems. That is done through retrieving similar cases from the system and utilizing their solutions. The latter approach aims at solving problems through reviewing, processing and applying their experiences. The present study aimed to shed a light on a CBR application. That is done to develop a new system for assisting Internet users and solving the problems faced by those users. In addition, the present study focuses on the cases retrieval stage. It aimed at designing and building an experienced inquiry system for solving any problem that internet users might face when using a case-based reasoning (CBR) dialogue. That shall enable internet users to solve the problems faced when using the Internet. The system that was developed by the researcher operates through displaying the similar cases through a dialogue. That is done through using a well-developed algorithm and reviewing the relevant previous studies. It was found that the success rate of the proposed system is high.
A simplified classification computational model of opinion mining using deep ...IJECEIAES
Opinion and attempts to develop an automated system to determine people's viewpoints towards various units such as events, topics, products, services, organizations, individuals, and issues. Opinion analysis from the natural text can be regarded as a text and sequence classification problem which poses high feature space due to the involvement of dynamic information that needs to be addressed precisely. This paper introduces effective modelling of human opinion analysis from social media data subjected to complex and dynamic content. Firstly, a customized preprocessing operation based on natural language processing mechanisms as an effective data treatment process towards building quality-aware input data. On the other hand, a suitable deep learning technique, bidirectional long short term-memory (Bi-LSTM), is implemented for the opinion classification, followed by a data modelling process where truncating and padding is performed manually to achieve better data generalization in the training phase. The design and development of the model are carried on the MATLAB tool. The performance analysis has shown that the proposed system offers a significant advantage in terms of classification accuracy and less training time due to a reduction in the feature space by the data treatment operation.
An efficient-classification-model-for-unstructured-text-documentSaleihGero
The document presents a classification model for unstructured text documents that aims to support both generality and efficiency. The model follows the logical sequence of text classification steps and proposes a combination of techniques for each step. Specifically, it uses multinomial naive bayes classification with term frequency-inverse document frequency (TF-IDF) representation. The model is tested on the 20-Newsgroups dataset and results show improved performance over precision, recall, and f-score compared to other models.
Machine learning for text document classification-efficient classification ap...IAESIJAI
Numerous alternative methods for text classification have been created because of the increase in the amount of online text information available. The cosine similarity classifier is the most extensively utilized simple and efficient approach. It improves text classification performance. It is combined with estimated values provided by conventional classifiers such as Multinomial Naive Bayesian (MNB). Consequently, combining the similarity between a test document and a category with the estimated value for the category enhances the performance of the classifier. This approach provides a text document categorization method that is both efficient and effective. In addition, methods for determining the proper relationship between a set of words in a document and its document categorization is also obtained.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHIJDKP
Text mining is an emerging research field evolving from information retrieval area. Clustering and
classification are the two approaches in data mining which may also be used to perform text classification
and text clustering. The former is supervised while the later is un-supervised. In this paper, our objective is
to perform text clustering by defining an improved distance metric to compute the similarity between two
text files. We use incremental frequent pattern mining to find frequent items and reduce dimensionality.
The improved distance metric may also be used to perform text classification. The distance metric is
validated for the worst, average and best case situations [15]. The results show the proposed distance
metric outperforms the existing measures.
Indexing based Genetic Programming Approach to Record Deduplicationidescitation
In this paper, we present a genetic programming (GP) approach to record
deduplication with indexing techniques.Data de-duplication is a process in which data are
cleaned from duplicate records due to misspelling, field swap or any other mistake or data
inconsistency. This process requires that we identify objects that are included in more than
one list.The problem of detecting and eliminating duplicated data is one of the major
problems in the broad area of data cleaning and data quality in data warehouse. So, we
need to create such a algorithm that can detect and eliminate maximum duplications.GP
with indexing is one of the optimization technique that helps to find maximum duplicates in
the database. We used adeduplication function that is able to identify whether two or more
entries in a repository are replicas or not. As many industries and systems depend on the
accuracy and reliability of databases to carry out operations. Therefore, the quality of the
information stored in the databases, can have significant cost implications to a system that
relies on information to function and conduct business. Moreover, this is fact that clean and
replica-free repositories not only allow the retrieval of higher quality information but also
lead to more concise data and to potential savings in computational time and resources to
process this data.
Index
Evidence Data Preprocessing for Forensic and Legal AnalyticsCSCJournals
The document discusses best practices for preprocessing evidentiary data from legal cases or forensic investigations for use in analytical experiments. It outlines key steps like identifying the analytical aim or problem based on the case scope or investigation protocol, understanding the case data through assessment and exploration of its format, features, quality, and potential issues. Challenges of working with common text-based case data like emails, social media posts are also discussed. The goal is to clean and transform raw data into a suitable format for machine learning or other advanced analytical techniques while maintaining integrity and relevance to the case.
The challenges with respect to mining frequent items over data streaming engaging variable window size
and low memory space are addressed in this research paper. To check the varying point of context change
in streaming transaction we have developed a window structure which will be in two levels and supports in
fixing the window size instantly and controls the heterogeneities and assures homogeneities among
transactions added to the window. To minimize the memory utilization, computational cost and improve the
process scalability, this design will allow fixing the coverage or support at window level. Here in this
document, an incremental mining of frequent item-sets from the window and a context variation analysis
approach are being introduced. The complete technology that we are presenting in this document is named
as Mining Frequent Item-sets using Variable Window Size fixed by Context Variation Analysis (MFI-VWSCVA).
There are clear boundaries among frequent and infrequent item-sets in specific item-sets. In this
design we have used window size change to represent the conceptual drift in an information stream. As it
were, whenever there is a problem in setting window size effectively the item-set will be infrequent. The
experiments that we have executed and documented proved that the algorithm that we have designed is
much efficient than that of existing.
Research on ontology based information retrieval techniquesKausar Mukadam
The document summarizes and compares three novel ontology-based information retrieval techniques. It discusses a technique for retrieving information in the domain of Traditional Chinese Medicine that uses an ontology to represent concepts and measures concept similarity to sort search results. It also describes a framework for semantic indexing and querying that uses an ontology and entity-attribute-value model to improve scalability, usability, and retrieval performance for transport systems. Additionally, it outlines a semantic extension retrieval model that uses ontology annotation and semantic extension of queries to address limitations of keyword-based search. The techniques are evaluated based on precision and recall measures to analyze their effectiveness compared to traditional methods.
Parallel multivariate deep learning models for time-series prediction: A comp...IAESIJAI
This study investigates deep learning models for financial data prediction and examines whether the architecture of a deep learning model and time-series data properties affect prediction accuracy. Comparing the performance of convolutional neural network (CNN), long short-term memory (LSTM), Stacked-LSTM, CNN-LSTM, and convolutional LSTM (ConvLSTM) when used as a prediction approach to a collection of financial time-series data is the main methodology of this study. In this instance, only those deep learning architectures that can predict multivariate time-series data sets in parallel are considered. This research uses the daily movements of 4 (four) Asian stock market indices from 1 January 2020 to 31 December 2020. Using data from the early phase of the spread of the Covid-19 pandemic that has created worldwide economic turmoil is intended to validate the performance of the analyzed deep learning models. Experiment results and analytical findings indicate that there is no superior deep learning model that consistently makes the most accurate predictions for all states' financial data. In addition, a single deep learning model tends to provide more accurate predictions for more stable time-series data, but the hybrid model is preferred for more chaotic time-series data.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing one single patient’s information gathered from a large volume of data over a long period of time. The main objective of this paper is to represent our ontology-based information retrieval approach for clinical Information System. We have performed a Case Study in the real life hospital settings. The results obtained illustrate the feasibility of the proposed approach which significantly improved the information retrieval process on a large volume of data over a long period of time from August 2011 until January 2012.
A rough set based hybrid method to text categorizationNinad Samel
This document summarizes a hybrid text categorization method that combines Latent Semantic Indexing (LSI) and Rough Sets theory to reduce the dimensionality of text data and generate classification rules. It introduces LSI to reduce the feature space of text documents represented as high-dimensional vectors. Then it applies Rough Sets theory to the reduced feature space to locate a minimal set of keywords that can distinguish document classes and generate multiple knowledge bases for classification instead of a single one. The method is tested on text categorization tasks and shown to improve accuracy over previous Rough Sets approaches.
This document discusses sequential pattern mining, which aims to discover patterns or rules in sequential data where events are ordered by time. It provides background on sequential pattern mining and its applications. The document also discusses related work on mining sequential patterns and rules from time-series data and across multiple sequences. It describes algorithms for efficiently mining sequential patterns at scale from large databases.
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...mlaij
In this new era, where tremendous information is available on the internet, it is of most important to
provide the improved mechanism to extract the information quickly and most efficiently. It is very difficult
for human beings to manually extract the summary of large documents of text. Therefore, there is a
problem of searching for relevant documents from the number of documents available, and absorbing
relevant information from it. In order to solve the above two problems, the automatic text summarization is
very much necessary. Text summarization is the process of identifying the most important meaningful
information in a document or set of related documents and compressing them into a shorter version
preserving its overall meanings. More specific, Abstractive Text Summarization (ATS), is the task of
constructing summary sentences by merging facts from different source sentences and condensing them
into a shorter representation while preserving information content and overall meaning. This Paper
introduces a newly proposed technique for Summarizing the abstractive newspapers’ articles based on
deep learning.
Similar to Faster Case Retrieval Using Hash Indexing Technique (20)
The Use of Java Swing’s Components to Develop a WidgetWaqas Tariq
Widget is a kind of application provides a single service such as a map, news feed, simple clock, battery-life indicators, etc. This kind of interactive software object has been developed to facilitate user interface (UI) design. A user interface (UI) function may be implemented using different widgets with the same function. In this article, we present the widget as a platform that is generally used in various applications, such as in desktop, web browser, and mobile phone. We also describe a visual menu of Java Swing’s components that will be used to establish widget. It will assume that we have successfully compiled and run a program that uses Swing components.
3D Human Hand Posture Reconstruction Using a Single 2D ImageWaqas Tariq
Passive sensing of the 3D geometric posture of the human hand has been studied extensively over the past decade. However, these research efforts have been hampered by the computational complexity caused by inverse kinematics and 3D reconstruction. In this paper, our objective focuses on 3D hand posture estimation based on a single 2D image with aim of robotic applications. We introduce the human hand model with 27 degrees of freedom (DOFs) and analyze some of its constraints to reduce the DOFs without any significant degradation of performance. A novel algorithm to estimate the 3D hand posture from eight 2D projected feature points is proposed. Experimental results using real images confirm that our algorithm gives good estimates of the 3D hand pose. Keywords: 3D hand posture estimation; Model-based approach; Gesture recognition; human- computer interface; machine vision.
Camera as Mouse and Keyboard for Handicap Person with Troubleshooting Ability...Waqas Tariq
Camera mouse has been widely used for handicap person to interact with computer. The utmost important of the use of camera mouse is must be able to replace all roles of typical mouse and keyboard. It must be able to provide all mouse click events and keyboard functions (include all shortcut keys) when it is used by handicap person. Also, the use of camera mouse must allow users troubleshooting by themselves. Moreover, it must be able to eliminate neck fatigue effect when it is used during long period. In this paper, we propose camera mouse system with timer as left click event and blinking as right click event. Also, we modify original screen keyboard layout by add two additional buttons (button “drag/ drop” is used to do drag and drop of mouse events and another button is used to call task manager (for troubleshooting)) and change behavior of CTRL, ALT, SHIFT, and CAPS LOCK keys in order to provide shortcut keys of keyboard. Also, we develop recovery method which allows users go from camera and then come back again in order to eliminate neck fatigue effect. The experiments which involve several users have been done in our laboratory. The results show that the use of our camera mouse able to allow users do typing, left and right click events, drag and drop events, and troubleshooting without hand. By implement this system, handicap person can use computer more comfortable and reduce the dryness of eyes.
A Proposed Web Accessibility Framework for the Arab DisabledWaqas Tariq
The Web is providing unprecedented access to information and interaction for people with disabilities. This paper presents a Web accessibility framework which offers the ease of the Web accessing for the disabled Arab users and facilitates their lifelong learning as well. The proposed framework system provides the disabled Arab user with an easy means of access using their mother language so they don’t have to overcome the barrier of learning the target-spoken language. This framework is based on analyzing the web page meta-language, extracting its content and reformulating it in a suitable format for the disabled users. The basic objective of this framework is supporting the equal rights of the Arab disabled people for their access to the education and training with non disabled people. Key Words : Arabic Moon code, Arabic Sign Language, Deaf, Deaf-blind, E-learning Interactivity, Moon code, Web accessibility , Web framework , Web System, WWW.
Real Time Blinking Detection Based on Gabor FilterWaqas Tariq
The document proposes a new method for real-time blinking detection based on Gabor filters. It begins by reviewing existing methods and their limitations in dealing with noise, variations in eye shape, and blinking speed. The proposed method uses a Gabor filter to extract the top and bottom arcs of the eye from an image. It then measures the distance between these arcs and compares it to a threshold: a distance below the threshold indicates a closed eye, while a distance above indicates an open eye. The document claims this Gabor filter-based approach is robust to noise, variations in eye shape and blinking speed. It presents experimental results showing the method can accurately detect blinking across different users.
Computer Input with Human Eyes-Only Using Two Purkinje Images Which Works in ...Waqas Tariq
A method for computer input with human eyes-only using two Purkinje images which works in a real time basis without calibration is proposed. Experimental results shows that cornea curvature can be estimated by using two light sources derived Purkinje images so that no calibration for reducing person-to-person difference of cornea curvature. It is found that the proposed system allows usersf movements of 30 degrees in roll direction and 15 degrees in pitch direction utilizing detected face attitude which is derived from the face plane consisting three feature points on the face, two eyes and nose or mouth. Also it is found that the proposed system does work in a real time basis.
Toward a More Robust Usability concept with Perceived Enjoyment in the contex...Waqas Tariq
Mobile multimedia service is relatively new but has quickly dominated people¡¯s lives, especially among young people. To explain this popularity, this study applies and modifies the Technology Acceptance Model (TAM) to propose a research model and conduct an empirical study. The goal of study is to examine the role of Perceived Enjoyment (PE) and what determinants can contribute to PE in the context of using mobile multimedia service. The result indicates that PE is influencing on Perceived Usefulness (PU) and Perceived Ease of Use (PEOU) and directly Behavior Intention (BI). Aesthetics and flow are key determinants to explain Perceived Enjoyment (PE) in mobile multimedia usage.
Collaborative Learning of Organisational KnolwedgeWaqas Tariq
This paper presents recent research into methods used in Australian Indigenous Knowledge sharing and looks at how these can support the creation of suitable collaborative envi- ronments for timely organisational learning. The protocols and practices as used today and in the past by Indigenous communities are presented and discussed in relation to their relevance to a personalised system of knowledge sharing in modern organisational cultures. This research focuses on user models, knowledge acquisition and integration of data for constructivist learning in a networked repository of or- ganisational knowledge. The data collected in the repository is searched to provide collections of up-to-date and relevant material for training in a work environment. The aim is to improve knowledge collection and sharing in a team envi- ronment. This knowledge can then be collated into a story or workflow that represents the present knowledge in the organisation.
Our research aims to propose a global approach for specification, design and verification of context awareness Human Computer Interface (HCI). This is a Model Based Design approach (MBD). This methodology describes the ubiquitous environment by ontologies. OWL is the standard used for this purpose. The specification and modeling of Human-Computer Interaction are based on Petri nets (PN). This raises the question of representation of Petri nets with XML. We use for this purpose, the standard of modeling PNML. In this paper, we propose an extension of this standard for specification, generation and verification of HCI. This extension is a methodological approach for the construction of PNML with Petri nets. The design principle uses the concept of composition of elementary structures of Petri nets as PNML Modular. The objective is to obtain a valid interface through verification of properties of elementary Petri nets represented with PNML.
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 BoardWaqas Tariq
The main aim of this paper is to build a system that is capable of detecting and recognizing the hand gesture in an image captured by using a camera. The system is built based on Altera’s FPGA DE2 board, which contains a Nios II soft core processor. Image processing techniques and a simple but effective algorithm are implemented to achieve this purpose. Image processing techniques are used to smooth the image in order to ease the subsequent processes in translating the hand sign signal. The algorithm is built for translating the numerical hand sign signal and the result are displayed on the seven segment display. Altera’s Quartus II, SOPC Builder and Nios II EDS software are used to construct the system. By using SOPC Builder, the related components on the DE2 board can be interconnected easily and orderly compared to traditional method that requires lengthy source code and time consuming. Quartus II is used to compile and download the design to the DE2 board. Then, under Nios II EDS, C programming language is used to code the hand sign translation algorithm. Being able to recognize the hand sign signal from images can helps human in controlling a robot and other applications which require only a simple set of instructions provided a CMOS sensor is included in the system.
An overview on Advanced Research Works on Brain-Computer InterfaceWaqas Tariq
A brain–computer interface (BCI) is a proficient result in the research field of human- computer synergy, where direct articulation between brain and an external device occurs resulting in augmenting, assisting and repairing human cognitive. Advanced works like generating brain-computer interface switch technologies for intermittent (or asynchronous) control in natural environments or developing brain-computer interface by Fuzzy logic Systems or by implementing wavelet theory to drive its efficacies are still going on and some useful results has also been found out. The requirements to develop this brain machine interface is also growing day by day i.e. like neuropsychological rehabilitation, emotion control, etc. An overview on the control theory and some advanced works on the field of brain machine interface are shown in this paper.
Exploring the Relationship Between Mobile Phone and Senior Citizens: A Malays...Waqas Tariq
There is growing ageing phenomena with the rise of ageing population throughout the world. According to the World Health Organization (2002), the growing ageing population indicates 694 million, or 223% is expected for people aged 60 and over, since 1970 and 2025.The growth is especially significant in some advanced countries such as North America, Japan, Italy, Germany, United Kingdom and so forth. This growing older adult population has significantly impact the social-culture, lifestyle, healthcare system, economy, infrastructure and government policy of a nation. However, there are limited research studies on the perception and usage of a mobile phone and its service for senior citizens in a developing nation like Malaysia. This paper explores the relationship between mobile phones and senior citizens in Malaysia from the perspective of a developing country. We conducted an exploratory study using contextual interviews with 5 senior citizens of how they perceive their mobile phones. This paper reveals 4 interesting themes from this preliminary study, in addition to the findings of the desirable mobile requirements for local senior citizens with respect of health, safety and communication purposes. The findings of this study bring interesting insight to local telecommunication industries as a whole, and will also serve as groundwork for more in-depth study in the future.
Principles of Good Screen Design in WebsitesWaqas Tariq
Visual techniques for proper arrangement of the elements on the user screen have helped the designers to make the screen look good and attractive. Several visual techniques emphasize the arrangement and ordering of the screen elements based on particular criteria for best appearance of the screen. This paper investigates few significant visual techniques in various web user interfaces and showcases the results for better understanding and their presence.
This document discusses the progress of virtual teams in Albania. It provides context on virtual teams and how they differ from traditional teams in their reliance on technology for communication across distances. The document then examines the use of virtual teams in Albania, noting the growing infrastructure and technology usage that enables virtual collaboration. It highlights some virtual team examples in Albanian government and academic projects.
Cognitive Approach Towards the Maintenance of Web-Sites Through Quality Evalu...Waqas Tariq
It is a well established fact that the Web-Applications require frequent maintenance because of cutting– edge business competitions. The authors have worked on quality evaluation of web-site of Indian ecommerce domain. As a result of that work they have made a quality-wise ranking of these sites. According to their work and also the survey done by various other groups Futurebazaar web-site is considered to be one of the best Indian e-shopping sites. In this research paper the authors are assessing the maintenance of the same site by incorporating the problems incurred during this evaluation. This exercise gives a real world maintainability problem of web-sites. This work will give a clear picture of all the quality metrics which are directly or indirectly related with the maintainability of the web-site.
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...Waqas Tariq
A paradox has been observed whereby web site usability is proven to be an essential element in a web site, yet at the same time there exist an abundance of web pages with poor usability. This discrepancy is the result of limitations that are currently preventing web developers in the commercial sector from producing usable web sites. In this paper we propose a framework whose objective is to alleviate this problem by automating certain aspects of the usability evaluation process. Mainstreaming comes as a result of automation, therefore enabling a non-expert in the field of usability to conduct the evaluation. This results in reducing the costs associated with such evaluation. Additionally, the framework allows the flexibility of adding, modifying or deleting guidelines without altering the code that references them since the guidelines and the code are two separate components. A comparison of the evaluation results carried out using the framework against published evaluations of web sites carried out by web site usability professionals reveals that the framework is able to automatically identify the majority of usability violations. Due to the consistency with which it evaluates, it identified additional guideline-related violations that were not identified by the human evaluators.
Robot Arm Utilized Having Meal Support System Based on Computer Input by Huma...Waqas Tariq
A robot arm utilized having meal support system based on computer input by human eyes only is proposed. The proposed system is developed for handicap/disabled persons as well as elderly persons and tested with able persons with several shapes and size of eyes under a variety of illumination conditions. The test results with normal persons show the proposed system does work well for selection of the desired foods and for retrieve the foods as appropriate as usersf requirements. It is found that the proposed system is 21% much faster than the manually controlled robotics.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
How Barcodes Can Be Leveraged Within Odoo 17Celine George
In this presentation, we will explore how barcodes can be leveraged within Odoo 17 to streamline our manufacturing processes. We will cover the configuration steps, how to utilize barcodes in different manufacturing scenarios, and the overall benefits of implementing this technology.
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...indexPub
The recent surge in pro-Palestine student activism has prompted significant responses from universities, ranging from negotiations and divestment commitments to increased transparency about investments in companies supporting the war on Gaza. This activism has led to the cessation of student encampments but also highlighted the substantial sacrifices made by students, including academic disruptions and personal risks. The primary drivers of these protests are poor university administration, lack of transparency, and inadequate communication between officials and students. This study examines the profound emotional, psychological, and professional impacts on students engaged in pro-Palestine protests, focusing on Generation Z's (Gen-Z) activism dynamics. This paper explores the significant sacrifices made by these students and even the professors supporting the pro-Palestine movement, with a focus on recent global movements. Through an in-depth analysis of printed and electronic media, the study examines the impacts of these sacrifices on the academic and personal lives of those involved. The paper highlights examples from various universities, demonstrating student activism's long-term and short-term effects, including disciplinary actions, social backlash, and career implications. The researchers also explore the broader implications of student sacrifices. The findings reveal that these sacrifices are driven by a profound commitment to justice and human rights, and are influenced by the increasing availability of information, peer interactions, and personal convictions. The study also discusses the broader implications of this activism, comparing it to historical precedents and assessing its potential to influence policy and public opinion. The emotional and psychological toll on student activists is significant, but their sense of purpose and community support mitigates some of these challenges. However, the researchers call for acknowledging the broader Impact of these sacrifices on the future global movement of FreePalestine.
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
Faster Case Retrieval Using Hash Indexing Technique
1. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 81
Faster Case Retrieval Using Hash Indexing Technique
Mohamad Farhan Mohamad Mohsin farhan@uum.edu.my
College of Arts & Sciences
Universiti Utara Malaysia
Kedah, 06010, Malaysia
Maznie Manaf maznie@kelantan.uitm.edu.my
Faculty of Computer Science & Mathematic
Universiti Teknologi Mara (Kelantan)
Kelantan, 18500, Malaysia
Norita Md Norwawi norita@usim.edu.my
Faculty of Science & Technology
Universiti Sains Islam Malaysia
71800, Nilai, Negeri Sembilan, Malaysia
Mohd Helmy Abd Wahab helmy@uthm.edu.my
Faculty of Engineering and Electrical Engineering
Universiti Tun Hussain Onn
Johor, 86400, Malaysia
Abstract
The main objective of case retrieval is to scan and to map the most similar old cases in case base
with a new problem. Beside accurateness, the time taken to retrieve case is also important. With
the increasing number of cases in case base, the retrieval task is becoming more challenging
where faster retrieval time and good accuracy are the main aim. Traditionally, sequential indexing
method has been applied to search for possible cases in case base. This technique worked fast
when the number of cases is small but requires more time to retrieve when the number of data in
case base grows. As an alternative, this paper presents the integration of hashing indexing
technique in case retrieval to mine large cases and speed up the retrieval time. Hashing indexing
searches a record by determining the index using only an entry’s search key without traversing all
records. To test the proposed method, real data namely Timah Tasoh Dam operational dataset,
which is temporal in nature that represents the historical hydrological data of daily Timah Tasoh
dam operation in Perlis, Malaysia ranging from year 1997-2005, was chosen as experiment.
Then, the hashing indexing performance is compared with sequential method in term of retrieval
time and accuracy. The finding indicates that hashing indexing is more accurate and faster than
sequential approach in retrieving cases. Besides that, the combination of hashing search key
produces better result compared to single search key.
Keywords: Hashing Indexing, Sequential Indexing, Case Retrieval, Case Base Reasoning.
1. INTRODUCTION
Case-based reasoning (CBR) is a model of reasoning that mimics a human deal with unseen
problem. It focuses on the human problem solving approach such as how people learn new skill
and generates solution about new situations based on their past experience. Similar mechanism
to human that intelligently adapts his experience for learning, CBR replicates the processes by
considering experiences as set of old cases and problem to be solved as a new case. To derive
to a conclusion, it executes four steps that are retrieve the most similar cases, reuse the retrieved
cases to solve the problem, revise the reused solution, and finally retain the revised experience in
case base for future decision making. Figure 1 illustrates the CBR decision making processes.
2. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 82
FIGURE 1: The CBR decision making processes [13]
Since it was introduced back in 1970, CBR has had a significant impact to many domains. For
example, the technique is widespread across in biology [1], medical for diagnostic and
therapeutic task [2], treatment [3], image retrieval [6, 12], project management and planning [7],
education and tutoring [8]. The advantages of CBR such as flexibility in knowledge modeling that
offers incremental case learning has made possible for CBR to be applied to extremely diverse
application domains. Due to the complexity of problem, CBR also has been integrated with soft
computing technique such as fuzzy logic [9], neural network [10], and genetic algorithm [11].
Theoretically, CBR maps the similarity between old and new case to derive conclusion.
Therefore, the number of old cases is important to lead CBR in producing good decision [3]. It
relies heavily on the quality of old cases but practically, to obtain a quality case is difficult to come
by [4], [5]. Nowadays, CBR has capability to store million cases in case base due to the advance
of data storage technology. With a parallel moving to that scenario, many researchers have
undertaken study on case retrieval mainly on the case indexing technique for faster retrieval time.
The selection of indexing type is important because it permits the system to match right case at
the right time [13].
In general, there are two types of indexing structures which are sequential and non-sequential
indexing. Sequential indexing- a conventional technique which has been applied to search for
possible cases in case base. Through sequential technique, cases are retrieved case by case
following a sequence until the most similar case is matched. It works fast when the number of
cases is small but the problem arises when the number of cases contain in case base is huge
which consume more time to retrieve.
In this study, a new approach for case indexing in CBR is proposed. This study researches the
non-sequential indexing called hashing as an alternative to cater large cases and achieve faster
retrieval time in CBR. Hashing indexing searches a record by determining the index using only an
entry’s search key without traveling to all records [14]. It utilizes small memory, faster retrieval
time, and easier to code compared to other indexing technique like data structure [15]. This
paper presents the review of the literature of both indexing methods and the integration of
hashing indexing in case retrieval with the aim to improve the retrieval performance. To test the
proposed method, a real data on Timah Tasoh Dam daily operation was chosen as an
experiment. The dataset is a temporal data representing the historical hydrological data of daily
Timah Tasoh dam operation in Perlis, Malaysia in the year 1997-2005. Then, the hashing
indexing performance is compared with sequential method in term of retrieval time and accuracy.
3. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 83
This paper is organized as follows. Section 2 outlines the literature of case retrieval and hashing
indexing. Then, the integration of hashing indexing technique in CBR is discussed in section 3. It
will be followed by a discussion on the research design of the study in Section 4. Section 5
describes the experiment data used in this study. In Section 6, the finding and result of the study
will be presented and final sections conclude this work.
2. CASE RETRIEVAL AND HASHING INDEXING
Decision making in CBR starts with the case retrieval. It involves the process of finding possible
cases in case base that are closest to the new case. For a given new case ,
where is the decision to be determined. The case retrieval is the process of finding old cases
that are close to . The mapping of and is represented as ( , ) where cases,
and is a query case. The similarity between both cases will be determined
based on the similarity, .
Two important criterion need to be determined for a quality case retrieval. Firstly, the mechanism
to control how the case base is searched and secondly, the suitable search key to guide
searching [13]. In reality, the case retrieval process is highly exploited computer memory and time
consuming due to the searching process in huge case base. Therefore, the case indexing
technique plays a very important role to determine the searching process either search for an
entire case or portion of it. According to [16], indexing and database representation is a
fundamental problem for efficient clustering, classification, and data retrieval. The main concern
in case retrieval in CBR is how to assess the similarity between cases in order to retrieve
appropriate precedents and how to adapt old solution to the case [17]. Beside the most similar,
the minimum time consume during the process is also important.
One of the indexing technique uses in case retrieval is sequential indexing. It is a conventional
approach applied in the early database technology which cases are retrieved case by case
following a sequence until the most similar case is matched. Since it scans the case base
following a sequence, this method is not efficient when the number of cases in case base is huge
which consume more time to retrieve. As a solution, hashing indexing method with search key
is proposed in database technology.
2.1 Hashing Indexing
Hashing indexing is commonly used in database application during data retrieval. This technique
has been developed to access large files residing on external storage, not just for accessing fixed
size files but flies that grow over their time. The idea of hashing is to map each records to a hash
key using hash function; . whereby is an index to the hash table, and
. represents an array of size . The will take a search key and produces
an integer type index representing each case in . After that, the case can be directly retrieved
at respective address from the . The address or search key, is generated from the function
whereby is the search key, is the table size and mod is the modulo operator.
The efficiency of can be seen in memory management. The approach requires substantial
less amount of memory as well as easier to code compared to tree data structure in traditional
indexing approach. It also works without requiring a complete file organization and rehashing [15].
In practice, can map two or more addresses into . The is capable to store more than
two items in the same array location in or tends to open other address. This occurrence is
called collision and the item involved are often called as synonyms [18]. An example of hashing
indexing technique which is adopted from [19] is shown in Figure 2.
4. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 84
FIGURE 2: Hashing Indexing Technique [19]
One of the limitations is when the records become full. It will start working very badly unless
separate chaining which is capable to handle collision is used. This is the reason why [18]
suggested that should never be allowed to get full. To determine either is full, the ratio of
the number entry located in need to be calculated. The ratio is known as load factor.
Generally, size should be automatically increased and the records in the table should be
rehashed when the ratio of table is reached 0.7 (70% full) or 0.8 (80% full) for open addressing
[14, 18].
Recently, many applications utilized hashing mechanism to solve specific problem such as in
programming that uses to keep track of declared variables in source code [14, 18, 19]. HT is
an ideal application for this kind of problem because only two operations are performed: insert
and find; identifiers are typically short, so the can be computed quickly. In this application,
most searches are successful.
Another common use of is in game programs. As the program search through different lines
of play, it keeps track of positions that it has encountered by computing based on the
position (and storing its move for the position). If the same position recurs, usually by a simple
transposition of moves, the program can avoid expensive recalculation.
3. THE MERGING OF HASHING INDEXING IN CASE RETRIVEVAL
The advantages of hashing indexing in data retrieval are faster retrieval time and minimize the
usage of computer resources. This motivation has lead to the merging of hashing indexing in
CBR since case retrieval requires a fast solution to retrieve case from case base. Figure 3(a)
depicts the concept of this technique and Figure 3(b) is a sequential indexing method. Sequential
indexing is a conventional technique practiced in CBR’s case retrieval.
5. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 85
FIGURE 3 (a): The Sequential Indexing Method and FIGURE 3 (b): The Hashing Indexing Method
A good is should be fast to compute, minimize collisions, and distribute entries uniformly
through the HT. In the proposed hash model, the separate chaining or close addressing is chosen
to resolve collisions. Through this method, specific location is allowed to store more than one
value called bucket, . A new map or address can be simply placed into a particular location
and associated value placed all the cases which have the related attribute in . The Figure 4
summarizes the separated cases location in hash table, illustratively.
FIGURE 4: The Separated Cases Location in HT
The modified hashing indexing algorithm in CBR involves two main tasks that are storing new
cases and retrieving a case. Process flow in Figure 5 represents the process of storing a new
case in case base. It starts with calculation of to determine location at hash table and then
store the current cases in . The algorithm of storing a case into hash table is shown in Algorithm
1
6. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 86
FIGURE 5: Storing a Case into Case Base
Algorithm 1: Storing a case into hash table
Input: Timah Tasoh Dam Dataset; search key ; size of data ; size of
attribute ; the selection attribute to calculate ; bucket
quantity , range of search key
counter
The case retrieving process based on CBR’s hashing indexing is shown in Figure 6. It starts with
calculating the hash key and map to the HT. The result is retrieved by finding the search key
after entering a new case. The formula is used to find the address in . Finally, the
similarity of cases in similar bucket is calculated to get the predicted result. The similarity of the
cases is calculated based on the local similarity and global similarity. Equation 1 and 2 displays
the calculation of both similarities.
(1)
where is local similarity, is a new case and is an old case
(2)
where is global similarity, is a new case and is an old case, is total cases in case
base, the local similarity calculate the attribute , and is the weight of the attribute
7. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 87
Calculate the hash key of the
search key to find find b at the
hash table
Have cases
in bucket?
Result not
found
Calculate the
similarity of case
Obtain the highest
similarity
Result found
Start
End
Yes
No
FIGURE 6: Retrieving a Case from HT
In this study, three search keys, are defined. The are mean of average rainfalls ,
change water level ( ), and combining mean average rainfall and change water level
( ) which are considered as as written in (2). Different are used to determine
which will produces better result mainly in high accuracy and low time retrieval. The
represents the historical hydrological data of daily Timah Tasoh dam operation in Perlis, Malaysia
in the year 1997-2005. Next section will describes this data set in detail.
(3)
Where is the table size, is the modulor operator, refer to Equation 3, refer to
Equation 4.
To calculate search key,
(4)
Where is the average rainfall at time , is the average rainfall at time t -2, t is the time
index
and to calculate search key
(5)
Where is the average rainfall at time is the average rainfall at time , t is
the time index
8. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 88
Every types of will have different size of hash table or called bucket, . The number of will
depends on the type of its . For example, the change of water level ( ) has three types of
water level, which are Alert, Warning and Danger [15]. Therefore, has three buckets. Table
1 shows the key, the number of bucket, and the range of case . From Table I, Figure 7
represents the bucket arrangement of ∆WL.
TABLE 1: Type of and The Number of b
Search key:
Type of water level Range of / m
0 Alert x ≤ 0.0034
1 Warning 0.0034 < x < 0.0061
2 Danger x ≥ 0.0061
FIGURE 7: The Arrangement Using Key
For mean of average rainfall key, it has four buckets which represent type of rainfall that are
Light, Moderate, Heavy and Very Heavy. Table 2 elaborates the type of rainfall while Figure 8
illustratively represents the bucket arrangement of key. The Figure 9 portrays the total number
of for the combination of and as thirds search key
TABLE 2: Type of Rainfall and The Number of
Search key:
Type of Rainfall Range of Rainfall / mm
0 Light x ≤ 11
1 Moderate 11< x < 32
2 Heavy 32 < x < 62
3 Very Heavy x ≥ 62
9. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 89
FIGURE 8: The arrangement using key
FIGURE 9: The arrangement using key
4. RESEARCH DESIGN
This section describes the research design used in this study which is illustrated in Figure 10.
There are three phases which start with development, then preparing data for mining, and lastly is
Case Mining.
10. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 90
FIGURE 10: The Research Design
The development phase focuses on the algorithm modification. This phase covers three steps
which are design development, implementation and testing. In the design development, two
approaches: sequential indexing and hashing indexing technique are designed and integrated
into CBR using Microsoft Visual C++. After that, the model will be tested. The aim of the testing
is to check the accurateness of the hash table and the similarity calculation during mining.
The second phase is preparing data for mining which includes four activities – selection, pre-
processing, transformation, and data partition. The aim of this process is to clean and prepare the
Timah Tasoh Dam dataset before presenting into the CBR mining system. The selection, pre-
processing, and data transformation process are explained in section 5. In data allocation, the
experiment data is divided into five folds with different set of training and testing data allocation.
The multiple folds are used for a variation set of result. The folds (training: testing) are 90:10,
80:20, 70:30, 60:40 and 50:50.
The last phase is case mining. It involves the mining of TImah Tasoh Dam data set with both
indexing methods. During experiment, two measurement metrics are recorded that are accuracy
and retrieval time. Then their results are compared. In order to measure the accuracy, the
algorithm is tested using various data partition by taking cases in case based as a test set. The
measurements are adopted from [15]. This is due to the fact that the real datasets consists of
unbalanced data where the number of occurrences of event is lower as compared to non-event
occurrence. The accuracy of the model is evaluated base on Equation 6.
(6)
Where is the number of event correctly predicted, is the number of predicted event but in
actual non-even, is the number of non-event correctly predicted, and the number of predicted
non-even but in actual even
Second measurement is retrieval time which refers to time taken to search for the similarity case
from case base. The time is tested by selecting one case from case base and the selected case
will be measured for both hashing and sequential technique. The retrieval time will be recorded
five times before calculate the average. A special loop is used to perform the task as shown in
coding in Figure 11.
11. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 91
FIGURE 11: Algorithm to Calculate Retrieval Time
5. TIMAH TASOH DAM DATASET
The experiment dataset set used in this study has 15 attributes of temporal data called Timah
Tasoh Dam dataset. It comprises the historical hydrological data of daily Timah Tasoh dam
operation in Perlis, Malaysia in the year 1997-2005. The preliminary observation on the raw
dataset, found out that some attributes are not related to study and certain values were missing.
Therefore, the dataset are pre-processed using temporal data series approach which was
adopted from [14,15].
During data preprocessing, only relevant attributes were selected. Out of 15 attributes, 4
attributes were chosen that are current water level, average rainfall, current change water level,
and current gate. Those attributes represents reservoir water level, rainfall measurement from 6
telemetry stations (Padang Besar, Tasoh, Lubuk Sireh, Kaki Bukit, Wang Kelian, and Guar Jentik)
and the number of spillway gates. Spillway gate refers to a structure that makes it possible for the
excess water to be released from the dam. Timah Tasoh has six gates and normally the water
will be released using Gate 2, Gate 4, and Gate 6 depending on the situation. The selection is
made using sliding window technique which is adopted from [14, 15]. After that, the data are re-
scaled into a suitable representation to increase mining speed and minimize memory allocation.
Table 3 is a sample of clean data which is ready for mining using CBR’s hashing indexing and
CBR’s sequential indexing model. Based on the table, the current water level (WL), average
rainfall at and , current change water level (∆WL) and current gate (GT) are the final
input to be mined using CBR.
TABLE3: A Sample of Timah Tasoh Dam Dataset After Pre-process
GT WL Average Rainfall Average Rainfall
2 29.275 7.33 5.375 0.0007
2 29.025 22.75 11 0.001
4 28.9 61.6 67.17 0.0075
4 28.895 21.6 39.7 0.0096
4 28.995 14 5.33 0.0007
2 29.32 17.5 32 0.0057
12. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 92
6. RESULT & FINDING
This section reports the finding of the integration of hashing indexing technique in case retrieval.
The tested model was compared with case retrieval function embedded with sequential indexing
technique. As elaborates in 4, the evaluation is conducted using two criteria that are
accurateness of the model to obtain similar cases and how fast it takes to retrieve cases. The
notation of the experiment is given as follows: The accuracy of the mining as %, and retrieval time
in millisecond is Ms, The result of the experiment is visually represented in Table 3.
TABLE 3: The Mining Result of Hashing and Sequential Indexing Technique in Ms and %
Data
Partition
Sequential
Indexing
Technique
Hashing Indexing Technique ( Search Key x)
m
∆WL m ^ ∆WL
Ms % Ms % Ms % Ms %
90 : 10 15.27 50 15.09 75 14.36 50 13.96 75
80 : 20 12.03 38 11.68 38 11.09 50 10.41 57
70 : 30 10.31 46 10.26 38 10.02 46 9.95 42
60 : 40 9.85 47 9.74 35 9.02 41 8.96 50
50 : 50 8.69 38 8.64 38 8.49 52 8.02 61
The analysis starts with the retrieval time of both methods. The result indicates that hashing
indexing method required less time for case retrieval in all experiments. For example, in the fold
60:40, sequential technique needs 9.85 ms to map all cases however the time taken are lesser in
hashing indexing technique with different search key ( = 9.74 ms, = 9.02 ms, =
8.96 ms). Moreover, the finding also reveals the combination of hashing search key is
looked as the most efficient key to mine cases faster compared to single search key. The graph in
figure 12 summarizes illustratively the retrieval time taken of both methods and figure 13 shows
the retrieval time taken in 60:40 fold as discussed in this paragraph.
FIGURE 12: The Retrieval Time Taken Hashing and Sequential Indexing
13. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 93
FIGURE 13: Retrieval Time Taken in 60:40 fold
Then, the accurateness of CBR to predict new case is evaluated. In this analysis, the CBR
modeling with hashing indexing technique leads the high accuracy. The graph in figure 14
summarizes the accuracy of both methods. Similar in time retrieval evaluation, the
search key is out performed the single search key and . It consistently obtains high
accuracy in all folds except in 70:30. Interestingly, the result also indicates that the sequential
indexing technique also capable to obtain good accuracy when overcome the hash indexing in
70:30 with 46% accurate and left behind the (38%) and (42%).
FIGURE 14: The Accuracy of Hashing and Sequential Indexing
Table 4 below summarizes the best technique of the whole experiments. The best technique is
selected based on the highest accuracy and shortest time taken to mine Timah Tasoh Dam
Dataset. From the table, it is clearly indicates that hashing indexing method has retrieved cases
faster that sequential with the combination search key as the best search key. In term
of accuracy, hashing indexing has scored higher then sequential technique. Out of 5 folds,
hashing indexing obtain better accuracies in 4 folds except in fold 70:30, the sequential indexing
generates similar accuracy with . Lastly, the combination search key is chosen as
the best search key due to is capability to generate high accuracy and retrieve case faster for
Timah Tasoh Dam Dataset.
14. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 94
TABLE 4: The Summarization of the Best Technique based on Accuracy and Retrieval Time
Data Partition Performance measurement metrics
Setting Accuracy Case Retrieval Time
90 : 10 and
80 : 20
70 : 30 Sequence and .
60 : 40
50 : 50
7. CONCLUSION
This research integrates the hashing indexing technique in case retrieval with the aim to cater
large cases stored in case base and faster retrieval time. Its performance is compared with the
sequential indexing technique using two criteria that are accuracy and retrieval time. From the
experiment towards temporal dataset called Timah Tasoh Dam, the hashing indexing is more
accurate and faster than sequential in retrieving cases. The finding of this study offers an
alternative technique for case base representation and case retrieval. The finding also can assist
future miner to mine cases faster, obtain better accuracy and minimize the computer resources
usage. For future study, the case retrieval with hashing indexing approach will be tested with
other type of data from various domains.
8. REFERENCES
[1] I. Jurisica, and J.I. Glasgow. “Applications of case-based reasoning in molecular biology”.
AI Magazine, American Association for Artificial Intelligence, vol. 25(1), pp. 85-95, 2004.
[2] R. Schmid, and L. Gleri. “Case-based Reasoning for Medical Knowledge-based Systems”.
International Journal of Medical Informatics, vol. 64, pp. 355, 2000.
[3] Yang, Z., Matsumura, Y., Kuwata, S., Kusuoka, H., and Takeda, H. “Similar Cases
Retrieval From the Database of Laboratory Test Results”. Journal of medical systems (J.
med. syst.), vol 27, pp. 271-282, 2003.
[4] E. Armengol, S. Ontanon, and E. Plaza. “Explaining Similarity in CBR”. Artificial Intelligence
Review. Vol. 24, 2002
[5] P. Rong, Q.Yang, and J.P. Sinno. “Mining Competent Case Bases for Case-Based
Reasoning”. Journal Artificial Intelligence, vol. 171, 2007.
[6] D.O. Sullivan, E. McLoughlin, B. Michela, and D.C. Wilson. “Capturing and reusing case-
based context for image retrieval,” In Proc. of the 19th International Joint Conference on
Artificial Intelligence, 2005.
[7] M.Emilia, N. Mosley, and C. Steve. “The Application of Case-Based Reasoning to Early
Web Project Cost Estimation,” In Proc. of the 26 the Annual International Computer
Software and Applications Conference (COMPSAC’02), 2002.
[8] K.S. Leen, and B. Todd. “Integrating Case-Based Reasoning and Meta-Learning for a Self-
Improving Intelligent Tutoring System”. International Journal of Artificial Intelligence in
Education table of contents archive, vol. 18(1):27-58, 2008.
[9] C.K.P. Wong. “Web access path prediction using fuzzy-cased based reasoning, Phd
Thesis, Hong Kong Polytechnic University, Hong Kong, 2003.
15. Mohamad Farhan Mohamad Mohsin, Maznie Manaf, Norita Md Norwawi & Mohd Helmy Abd Wahab
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2) : Issue (2) : 2011 95
[10] J.M. Corchodo and B. Lees. “Adaption of cases for case-base forcasting with neural
network support,” in Soft computing in case based reasoning, 1st
ed., vol.1. S.K.Pal,
S.D.Tharam, and D.S. Yeung, ed. London: Springer-Verlag, 2001, 293-320.
[11] K.S. Shin and I.Han. “Case-based reasoning supported by genetic algorithm for corporate
bond rating”. Expert system with application, vol. 1266, pg.1-12. 1997.
[12] H. Hamza, Y. Belaid, and A. Belaid. “A case-based reasoning approach for unknown class
Invoice Processing,” in Proc. of the IEEE International Conference on Image Processing,
(ICIP), 2007, pp. 353-356.
[13] K.P. Sankar and K.S. Simon. Foundation of Soft Case-Based Reasoning, John Willey &
Sons Inc, 2004, pp. 1-32.
[14] F. M. Carrano, and W. Savitch. Data Structures and Abstractions with Java. USA: Pearson
Education, 2003.
[15] M. Griebel and G. Zumbusch. “Hash-Storage Techniques for Adaptive Multilevel Solvers
and Their Domain Decomposition Parallelization”. In Proc. of Domain Decomposition
Methods 10 (DD10), 1998.
[16] X. He, D. Cai, H. Liu, and W. Ma. “Locality Preserving Indexing for Document
Representation,” in Proc. of the 27
th
conference on research and development in
information retrieval, 2004.
[17] E. Armengol, S. Ontanon, and E. Plaza. “Explaining Similarity in CBR”. Artificial Intelligence
Review. vol. 24(2), 2004.
[18] W. D. Maurer, and T.G. Lewis. “Hash Table Methods”. ACM Computing Surveys (CSUR),
vol 1, pp. 5-19, 1975.
[19] N.M. Darus, Y. Yusof, H. Mohd, and F. Baharom. “Struktur data dan algoritma
menggunakan java”. Selangor, Malaysia: Pearson Prentice Hall, vol. 1, 2003.