Mining biological data is an emergent area at the intersection between bioinformatics and data mining
(DM). The intelligent agent based model is a popular approach in constructing Distributed Data Mining
(DDM) systems to address scalable mining over large scale distributed data. The nature of associations
between different amino acids in proteins has also been a subject of great anxiety. There is a strong need to
develop new models and exploit and analyze the available distributed biological data sources. In this study,
we have designed and implemented a multi-agent system (MAS) called Agent enriched Quantitative
Association Rules Mining for Amino Acids in distributed Protein Data Banks (AeQARM-AAPDB). Such
globally strong association rules enhance understanding of protein composition and are desirable for
synthesis of artificial proteins. A real protein data bank is used to validate the system.
A SERIAL COMPUTING MODEL OF AGENT ENABLED MINING OF GLOBALLY STRONG ASSOCIATI...ijcsa
The intelligent agent based model is a popular approach in constructing Distributed Data Mining (DDM) systems to address scalable mining over large scale and ever increasing distributed data. In an agent based
distributed system, variety of agents coordinate and communicate with each other to perform the various
tasks of the Data Mining (DM) process. In this study a serial computing mode of a multi-agent system
(MAS) called Agent enabled Mining of Globally Strong Association Rules (AeMGSAR) is presented based
on the serial itinerary of the mobile agents. A Running environment is also designed for the implementation and performance study of AeMGSAR system.
In today’s world there is a wide availability of huge amount of data and thus there is a need for turning this
data into useful information which is referred to as knowledge. This demand for knowledge discovery
process has led to the development of many algorithms used to determine the association rules. One of the
major problems faced by these algorithms is generation of candidate sets. The FP-Tree algorithm is one of
the most preferred algorithms for association rule mining because it gives association rules without
generating candidate sets. But in the process of doing so, it generates many CP-trees which decreases its
efficiency. In this research paper, an improvised FP-tree algorithm with a modified header table, along
with a spare table and the MFI algorithm for association rule mining is proposed. This algorithm generates
frequent item sets without using candidate sets and CP-trees.
A Performance Based Transposition algorithm for Frequent Itemsets GenerationWaqas Tariq
Association Rule Mining (ARM) technique is used to discover the interesting association or correlation among a large set of data items. it plays an important role in generating frequent itemsets from large databases. Many industries are interested in developing the association rules from their databases due to continuous retrieval and storage of huge amount of data. The discovery of interesting association relationship among business transaction records in many business decision making process such as catalog decision, cross-marketing, and loss-leader analysis. It is also used to extract hidden knowledge from large datasets. The ARM algorithms such as Apriori, FP-Growth requires repeated scans over the entire database. All the input/output overheads that are being generated during repeated scanning the entire database decrease the performance of CPU, memory and I/O overheads. In this paper, we have proposed a Performance Based Transposition Algorithm (PBTA) for frequent itemsets generation. We will compare proposed algorithm with Apriori algorithm for frequent itemsets generation. The CPU and I/O overhead can be reduced in our proposed algorithm and it is much faster than other ARM algorithms.
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and
algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive
parameters have been used to compare their performance. However, depending upon the application a
wrong true positive or wrong false positive may have severe detrimental effects. This necessitates inclusion
of cost sensitive parameters in the performance. Moreover the most common testing dataset KDD-CUP-99
has huge size of data which intern require certain amount of pre-processing. Our work in this paper starts
with enumerating the necessity of cost sensitive analysis with some real life examples. After discussing
KDD-CUP-99 an approach is proposed for feature elimination and then features selection to reduce the
number of more relevant features directly and size of KDD-CUP-99 indirectly. From the reported
literature general methods for anomaly detection are selected which perform best for different types of
attacks. These different classifiers are clubbed to form an ensemble. A cost opportunistic technique is
suggested to allocate the relative weights to classifiers ensemble for generating the final result. The cost
sensitivity of true positive and false positive results is done and a method is proposed to select the elements
of cost sensitivity metrics for further improving the results to achieve the overall better performance. The
impact on performance trade of due to incorporating the cost sensitivity is discussed.
The advent of social networks has changed the research in computer science. Now, the massive volume of data has present in the form of twitter, facebook, emails, IOT (Internet of Things). So, the storage and analysis of these data has become great challenge for researchers. Traditional frameworks have failed for the processing of large data. R is open source programming framework developed for the analysis of large data results better accuracy. It also gives the opportunity of the implementation in R programming language. In this paper, a study on the use of R for the classification of large social network data. Naïve Bayes algorithm is used for the classification of large twitter data. The experiment has shown that enormous amount of data can be sufficiently classified using the R framework with promising results.
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
The current trend in the industry is to analyze large data sets and apply data mining, machine learning techniques to identify a pattern. But the challenges with huge data sets are the high dimensions associated with it. Sometimes in data analytics applications, large amounts of data produce worse performance. Also, most of the data mining algorithms are implemented column wise and too many columns restrict the performance and make it slower. Therefore, dimensionality reduction is an important step in data analysis. Dimensionality reduction is a technique that converts high dimensional data into much lower dimension, such that maximum variance is explained within the first few dimensions. This paper focuses on multivariate statistical and artificial neural networks techniques for data reduction. Each method has a different rationale to preserve the relationship between input parameters during analysis. Principal Component Analysis which is a multivariate technique and Self Organising Map a neural network technique is presented in this paper. Also, a hierarchical clustering approach has been applied to the reduced data set. A case study of Air quality measurement has been considered to evaluate the performance of the proposed techniques.
Comparative analysis of association rule generation algorithms in data streamsIJCI JOURNAL
Data mining technology is engaged in establishing helpful and unfamiliar data from the huge databases. Generally, data mining methods are useful for static databases for knowledge extraction wherever currently available data mining techniques are not appropriate and it also has a number of limitations for managing dynamic databases. A data stream manages dynamic data sets and it has become one of the essential research domains in data mining. The fundamental definition of the data stream is an arrival of continuous and unlimited data which may not be stored fully because it needs more storage capacity. In
order to perform data analysis with this, many new data mining techniques are to be required. Data analysis is carried out by using clustering, classification, frequent item set mining and association rule generation. Association rule mining is one of the significant research problems in the data stream which helps to find out the relationship between the data items in the transactional databases. This research work concentrated on how the traditional algorithms are used for generating association rules in data streams. The algorithms used in this work are Assoc Outliers, Frequent Item sets and Supervised Association Rule. A number of rules generated by an algorithm and execution time are considered as the performance factors. Experimental results give that Frequent Item set algorithm efficiency is better than Assoc Outliers and Supervised Association Rule Algorithms. This implementation work is executed in the Tanagra data mining tool.
A SERIAL COMPUTING MODEL OF AGENT ENABLED MINING OF GLOBALLY STRONG ASSOCIATI...ijcsa
The intelligent agent based model is a popular approach in constructing Distributed Data Mining (DDM) systems to address scalable mining over large scale and ever increasing distributed data. In an agent based
distributed system, variety of agents coordinate and communicate with each other to perform the various
tasks of the Data Mining (DM) process. In this study a serial computing mode of a multi-agent system
(MAS) called Agent enabled Mining of Globally Strong Association Rules (AeMGSAR) is presented based
on the serial itinerary of the mobile agents. A Running environment is also designed for the implementation and performance study of AeMGSAR system.
In today’s world there is a wide availability of huge amount of data and thus there is a need for turning this
data into useful information which is referred to as knowledge. This demand for knowledge discovery
process has led to the development of many algorithms used to determine the association rules. One of the
major problems faced by these algorithms is generation of candidate sets. The FP-Tree algorithm is one of
the most preferred algorithms for association rule mining because it gives association rules without
generating candidate sets. But in the process of doing so, it generates many CP-trees which decreases its
efficiency. In this research paper, an improvised FP-tree algorithm with a modified header table, along
with a spare table and the MFI algorithm for association rule mining is proposed. This algorithm generates
frequent item sets without using candidate sets and CP-trees.
A Performance Based Transposition algorithm for Frequent Itemsets GenerationWaqas Tariq
Association Rule Mining (ARM) technique is used to discover the interesting association or correlation among a large set of data items. it plays an important role in generating frequent itemsets from large databases. Many industries are interested in developing the association rules from their databases due to continuous retrieval and storage of huge amount of data. The discovery of interesting association relationship among business transaction records in many business decision making process such as catalog decision, cross-marketing, and loss-leader analysis. It is also used to extract hidden knowledge from large datasets. The ARM algorithms such as Apriori, FP-Growth requires repeated scans over the entire database. All the input/output overheads that are being generated during repeated scanning the entire database decrease the performance of CPU, memory and I/O overheads. In this paper, we have proposed a Performance Based Transposition Algorithm (PBTA) for frequent itemsets generation. We will compare proposed algorithm with Apriori algorithm for frequent itemsets generation. The CPU and I/O overhead can be reduced in our proposed algorithm and it is much faster than other ARM algorithms.
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and
algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive
parameters have been used to compare their performance. However, depending upon the application a
wrong true positive or wrong false positive may have severe detrimental effects. This necessitates inclusion
of cost sensitive parameters in the performance. Moreover the most common testing dataset KDD-CUP-99
has huge size of data which intern require certain amount of pre-processing. Our work in this paper starts
with enumerating the necessity of cost sensitive analysis with some real life examples. After discussing
KDD-CUP-99 an approach is proposed for feature elimination and then features selection to reduce the
number of more relevant features directly and size of KDD-CUP-99 indirectly. From the reported
literature general methods for anomaly detection are selected which perform best for different types of
attacks. These different classifiers are clubbed to form an ensemble. A cost opportunistic technique is
suggested to allocate the relative weights to classifiers ensemble for generating the final result. The cost
sensitivity of true positive and false positive results is done and a method is proposed to select the elements
of cost sensitivity metrics for further improving the results to achieve the overall better performance. The
impact on performance trade of due to incorporating the cost sensitivity is discussed.
The advent of social networks has changed the research in computer science. Now, the massive volume of data has present in the form of twitter, facebook, emails, IOT (Internet of Things). So, the storage and analysis of these data has become great challenge for researchers. Traditional frameworks have failed for the processing of large data. R is open source programming framework developed for the analysis of large data results better accuracy. It also gives the opportunity of the implementation in R programming language. In this paper, a study on the use of R for the classification of large social network data. Naïve Bayes algorithm is used for the classification of large twitter data. The experiment has shown that enormous amount of data can be sufficiently classified using the R framework with promising results.
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
The current trend in the industry is to analyze large data sets and apply data mining, machine learning techniques to identify a pattern. But the challenges with huge data sets are the high dimensions associated with it. Sometimes in data analytics applications, large amounts of data produce worse performance. Also, most of the data mining algorithms are implemented column wise and too many columns restrict the performance and make it slower. Therefore, dimensionality reduction is an important step in data analysis. Dimensionality reduction is a technique that converts high dimensional data into much lower dimension, such that maximum variance is explained within the first few dimensions. This paper focuses on multivariate statistical and artificial neural networks techniques for data reduction. Each method has a different rationale to preserve the relationship between input parameters during analysis. Principal Component Analysis which is a multivariate technique and Self Organising Map a neural network technique is presented in this paper. Also, a hierarchical clustering approach has been applied to the reduced data set. A case study of Air quality measurement has been considered to evaluate the performance of the proposed techniques.
Comparative analysis of association rule generation algorithms in data streamsIJCI JOURNAL
Data mining technology is engaged in establishing helpful and unfamiliar data from the huge databases. Generally, data mining methods are useful for static databases for knowledge extraction wherever currently available data mining techniques are not appropriate and it also has a number of limitations for managing dynamic databases. A data stream manages dynamic data sets and it has become one of the essential research domains in data mining. The fundamental definition of the data stream is an arrival of continuous and unlimited data which may not be stored fully because it needs more storage capacity. In
order to perform data analysis with this, many new data mining techniques are to be required. Data analysis is carried out by using clustering, classification, frequent item set mining and association rule generation. Association rule mining is one of the significant research problems in the data stream which helps to find out the relationship between the data items in the transactional databases. This research work concentrated on how the traditional algorithms are used for generating association rules in data streams. The algorithms used in this work are Assoc Outliers, Frequent Item sets and Supervised Association Rule. A number of rules generated by an algorithm and execution time are considered as the performance factors. Experimental results give that Frequent Item set algorithm efficiency is better than Assoc Outliers and Supervised Association Rule Algorithms. This implementation work is executed in the Tanagra data mining tool.
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
Biclusters are required to analyzing gene expression patterns of genes comparing rows in expression profiles and analyzing expression profiles of samples by comparing columns in gene expression matrix. In the process of biclustering we need to cluster genes and samples. The algorithm presented in this paper is based upon the two-way clustering approach in which the genes and samples are clustered using parallel fuzzy C-means clustering using message passing interface, we call it MFCM. MFCM applied for clustering on genes and samples which maximize membership function values of the data set. It is a parallelized rework of a parallel fuzzy two-way clustering algorithm for microarray gene expression data [9], to study the efficiency and parallelization improvement of the algorithm. The algorithm uses gene entropy measure to filter the clustered data to find biclusters. The method is able to get highly correlated biclusters of the gene expression dataset.
Mining of Prevalent Ailments in a Health Database Using Fp-Growth AlgorithmWaqas Tariq
Health databases are characterised by large number of attributes such as personal biological and diagnosis information, health history, prescription, billing information and so on. The increasing need for providing enhanced medical system has necessitated the need for adopting an efficient data mining technique for extracting hidden and useful information from health database. In the past, many data mining algorithms such as Apriori, Eclat, H-Mine have been developed with deficiency in time-space trade off. In this work, an enhanced FP-growth frequent pattern mining algorithm coined FP-Ail is applied to students’ health database with a view to provide information about prevalent ailments and suggestions for managing the identified ailments. FP-Ail is tested on a student’s health database of a tertiary institution in Nigeria and the results obtained could be used by the management of the health centre for enhanced strategic decision making about health care. FP-Ail also provides the possibility to refine the minimum support threshold interactively, and to see the changes instantly.
Mining of Prevalent Ailments in a Health Database Using Fp-Growth AlgorithmWaqas Tariq
Health databases are characterised by large number of attributes such as personal biological and diagnosis information, health history, prescription, billing information and so on. The increasing need for providing enhanced medical system has necessitated the need for adopting an efficient data mining technique for extracting hidden and useful information from health database. In the past, many data mining algorithms such as Apriori, Eclat, H-Mine have been developed with deficiency in time-space trade off. In this work, an enhanced FP-growth frequent pattern mining algorithm coined FP-Ail is applied to students’ health database with a view to provide information about prevalent ailments and suggestions for managing the identified ailments. FP-Ail is tested on a student’s health database of a tertiary institution in Nigeria and the results obtained could be used by the management of the health centre for enhanced strategic decision making about health care. FP-Ail also provides the possibility to refine the minimum support threshold interactively, and to see the changes instantly.
An intrusion detection algorithm for amiIJCI JOURNAL
Nowadays, using the smart metering devices for energy users to manage a wide variety of subscribers,
reading devices for measuring, billing, disconnection and connection of subscribers’ connection
management is an important issue. The performance of these intelligent systems is based on information
transfer in the context of information technology, so reported data from network should be managed to
avoid the malicious activities that including the issues that could affect the quality of service the system. In
this paper for control of the reported data and to ensure the veracity of the obtained information, using
intrusion detection system is proposed based on the support vector machine and principle component
analysis (PCA) to recognize and identify the intrusions and attacks in the smart grid. Here, the operation of
intrusion detection systems for different kernel of SVM when using support vector machine (SVM) and PCA
simultaneously is studied. To evaluate the algorithm, based on data KDD99, numerical simulation is done
on five different kernels for an intrusion detection system using support vector machine with PCA
simultaneously. Also comparison analysis is investigated for presented intrusion detection algorithm in
terms of time - response, rate of increase network efficiency and increase system error and differences in
the use or lack of use PCA. The results indicate that correct detection rate and the rate of attack error
detection have best value when PCA is used, and when the core of algorithm is radial type, in SVM
algorithm reduces the time for data analysis and enhances performance of intrusion detection.
The premise of this paper is to discover frequent patterns by the use of data grids in WEKA 3.8 environment. Workload imbalance occurs due to the dynamic nature of the grid computing hence data grids are used for the creation and validation of data. Association rules are used to extract the useful information from the large database. In this paper the researcher generate the best rules by using WEKA 3.8 for better performance. WEKA 3.8 is used to accomplish best rules and implementation of various algorithms.
Construction of a compact FP-tree ensures that subsequent mining can be performed with a rather compact data structure. This does not automatically guarantee that it will be highly efficient since one may still encounter the combinatorial problem of candidate generation if one simply uses this FP-tree to generate and check all the candidate patterns. we study how to explore the compact information stored in an FP-tree, develop the principles of frequent-pattern growth by examination of our running example, explore how to perform further optimization when there exit a single prefix path in an FP-tree, and propose a frequent- pattern growth algorithm, FP-growth, for mining the complete set of frequent patterns using FP-tree.
SCCAI- A Student Career Counselling Artificial Intelligencevivatechijri
As education is growing day by day, the competition has prompted a need for the student to
understand more about the educational field. Many times the counselor isn’t available all the time and
sometimes due to the lack of proper knowledge about some educational field. Due to this, it creates an issue of
misconception of that field. This creates a problem for the student to decide a proper educational trajectory and
guidance is not always useful. The proposed paper will overcome all these problem using machine learning
algorithm. Various algorithms are being considered and amongst them the best suitable for our project are used
here. There are 3 major problems that come across our path and they are solved using Random forest, Linear
regression and Searching algorithm using Google API. At first Searching algorithm solves the problem of
location by segregating the college’s location vice, then Random Forest provides the list of colleges by using
stream and range of percentage and finally Linear Regression predicts the current cutoff using previous years’
data. Rather than this, the proposed system also provides information regarding all fields of education helping
students to understand and know about their field of interest better. The following idea is a total fresh idea with
no existing projects of similar kind. This project will help students guide them throughout.
● Alpha Power-Kumaraswamy Distribution with An Application on Survival Times of Cancer Patients
https://ojs.bilpublishing.com/index.php/jcsr/article/view/1855
● Energy-Efficient Transaction Serialization for IoT Devices
https://ojs.bilpublishing.com/index.php/jcsr/article/view/1620
● Mobile Software Assurance Informed through Knowledge Graph Construction: The OWASP Threat of Insecure Data Storage https://ojs.bilpublishing.com/index.php/jcsr/article/view/1765
● Neural Network Based Adaptation Algorithm for Online Prediction of Mechanical Properties of Steel
https://ojs.bilpublishing.com/index.php/jcsr/article/view/1966
● Research on Innovative Model Based on Solving Demand Pain Points in Software Development
https://ojs.bilpublishing.com/index.php/jcsr/article/view/1957
Fuzzy C-means clustering on rainfall flow optimization technique for medical ...IAESIJAI
Due to various killing diseases in the world, medical data clustering is a very
challenging and critical task to handle and to take the proper decision from
multidimensional complex data in an effective manner. The most familiar
and suitable speedy clustering algorithm is K-means than other traditional
clustering approaches. But K-means is extra sensitive for initialization of
clustering centroid and it can easily surround. Thus, there is a necessity for
faster clustering with an effective optimum clustering centroid. Based on
that, this research paper projected an optimization-based clustering by hybrid
fuzzy C-means (FCM) clustering on rainfall flow optimization technique
(RFFO), which is the normal flow and behavior of rainfall flow from one
position to another position. FCM clustering algorithm is used to cluster the
given medical data and RFFO is used to produce optimum clustering
centroid. Finally, the clustering performance is also measured for the
proposed FCM clustering on RFFO technique with the help of accuracy,
random coefficient, and Jaccard coefficient for medical data set and find the
risk factor of a heart attack.
Comparative study of frequent item set in data miningijpla
In this paper, we are an overview of already presents frequent item set mining algorithms. In these days
frequent item set mining algorithm is very popular but in the frequent item set mining computationally
expensive task. Here we described different process which use for item set mining, We also compare
different concept and algorithm which used for generation of frequent item set mining From the all the
types of frequent item set mining algorithms that have been developed we will compare important ones. We
will compare the algorithms and analyze their run time performance.
Efficient Database Management System For Wireless Sensor Network Onyebuchi nosiri
An effective database management system has been put forward in this work to tackle the problem in remote monitoring using Wireless Sensor Network Object Oriented Analysis and Design method employed as classes was evolved to create objects in the employed program used. An algorithm was developed with a corresponding flowchart to realize the design, the work also came up with a dynamic graph plotter, as this offers an adaptive monitoring facility for data stored in the Database. Sensor Node query was implemented and result of transmitted data was filtered for a particular node
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
ENHANCING ENGLISH WRITING SKILLS THROUGH INTERNET-PLUS TOOLS IN THE PERSPECTI...ijfcstjournal
This investigation delves into incorporating a hybridized memetic strategy within the framework of English
composition pedagogy, leveraging Internet Plus resources. The study aims to provide an in-depth analysis
of how this method influences students’ writing competence, their perceptions of writing, and their
enthusiasm for English acquisition. Employing an explanatory research design that combines qualitative
and quantitative methods, the study collects data through surveys, interviews, and observations of students’
writing performance before and after the intervention. Findings demonstrate a beneficial impact of
integrating the memetic approach alongside Internet Plus tools on the writing aptitude of English as a
Foreign Language (EFL) learners. Students reported increased engagement with writing, attributing it to
the use of Internet plus tools. They also expressed that the memetic approach facilitated a deeper
understanding of cultural and social contexts in writing. Furthermore, the findings highlight a significant
improvement in students’ writing skills following the intervention. This study provides significant insights
into the practical implementation of the memetic approach within English writing education, highlighting
the beneficial contribution of Internet Plus tools in enriching students' learning journeys.
A SURVEY TO REAL-TIME MESSAGE-ROUTING NETWORK SYSTEM WITH KLA MODELLINGijfcstjournal
Messages routing over a network is one of the most fundamental concept in communication which requires
simultaneous transmission of messages from a source to a destination. In terms of Real-Time Routing, it
refers to the addition of a timing constraint in which messages should be received within a specified time
delay. This study involves Scheduling, Algorithm Design and Graph Theory which are essential parts of
the Computer Science (CS) discipline. Our goal is to investigate an innovative and efficient way to present
these concepts in the context of CS Education. In this paper, we will explore the fundamental modelling of
routing real-time messages on networks. We study whether it is possible to have an optimal on-line
algorithm for the Arbitrary Directed Graph network topology. In addition, we will examine the message
routing’s algorithmic complexity by breaking down the complex mathematical proofs into concrete, visual
examples. Next, we explore the Unidirectional Ring topology in finding the transmission’s
“makespan”.Lastly, we propose the same network modelling through the technique of Kinesthetic Learning
Activity (KLA). We will analyse the data collected and present the results in a case study to evaluate the
effectiveness of the KLA approach compared to the traditional teaching method.
More Related Content
Similar to AGENT ENABLED MINING OF DISTRIBUTED PROTEIN DATA BANKS
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
Biclusters are required to analyzing gene expression patterns of genes comparing rows in expression profiles and analyzing expression profiles of samples by comparing columns in gene expression matrix. In the process of biclustering we need to cluster genes and samples. The algorithm presented in this paper is based upon the two-way clustering approach in which the genes and samples are clustered using parallel fuzzy C-means clustering using message passing interface, we call it MFCM. MFCM applied for clustering on genes and samples which maximize membership function values of the data set. It is a parallelized rework of a parallel fuzzy two-way clustering algorithm for microarray gene expression data [9], to study the efficiency and parallelization improvement of the algorithm. The algorithm uses gene entropy measure to filter the clustered data to find biclusters. The method is able to get highly correlated biclusters of the gene expression dataset.
Mining of Prevalent Ailments in a Health Database Using Fp-Growth AlgorithmWaqas Tariq
Health databases are characterised by large number of attributes such as personal biological and diagnosis information, health history, prescription, billing information and so on. The increasing need for providing enhanced medical system has necessitated the need for adopting an efficient data mining technique for extracting hidden and useful information from health database. In the past, many data mining algorithms such as Apriori, Eclat, H-Mine have been developed with deficiency in time-space trade off. In this work, an enhanced FP-growth frequent pattern mining algorithm coined FP-Ail is applied to students’ health database with a view to provide information about prevalent ailments and suggestions for managing the identified ailments. FP-Ail is tested on a student’s health database of a tertiary institution in Nigeria and the results obtained could be used by the management of the health centre for enhanced strategic decision making about health care. FP-Ail also provides the possibility to refine the minimum support threshold interactively, and to see the changes instantly.
Mining of Prevalent Ailments in a Health Database Using Fp-Growth AlgorithmWaqas Tariq
Health databases are characterised by large number of attributes such as personal biological and diagnosis information, health history, prescription, billing information and so on. The increasing need for providing enhanced medical system has necessitated the need for adopting an efficient data mining technique for extracting hidden and useful information from health database. In the past, many data mining algorithms such as Apriori, Eclat, H-Mine have been developed with deficiency in time-space trade off. In this work, an enhanced FP-growth frequent pattern mining algorithm coined FP-Ail is applied to students’ health database with a view to provide information about prevalent ailments and suggestions for managing the identified ailments. FP-Ail is tested on a student’s health database of a tertiary institution in Nigeria and the results obtained could be used by the management of the health centre for enhanced strategic decision making about health care. FP-Ail also provides the possibility to refine the minimum support threshold interactively, and to see the changes instantly.
An intrusion detection algorithm for amiIJCI JOURNAL
Nowadays, using the smart metering devices for energy users to manage a wide variety of subscribers,
reading devices for measuring, billing, disconnection and connection of subscribers’ connection
management is an important issue. The performance of these intelligent systems is based on information
transfer in the context of information technology, so reported data from network should be managed to
avoid the malicious activities that including the issues that could affect the quality of service the system. In
this paper for control of the reported data and to ensure the veracity of the obtained information, using
intrusion detection system is proposed based on the support vector machine and principle component
analysis (PCA) to recognize and identify the intrusions and attacks in the smart grid. Here, the operation of
intrusion detection systems for different kernel of SVM when using support vector machine (SVM) and PCA
simultaneously is studied. To evaluate the algorithm, based on data KDD99, numerical simulation is done
on five different kernels for an intrusion detection system using support vector machine with PCA
simultaneously. Also comparison analysis is investigated for presented intrusion detection algorithm in
terms of time - response, rate of increase network efficiency and increase system error and differences in
the use or lack of use PCA. The results indicate that correct detection rate and the rate of attack error
detection have best value when PCA is used, and when the core of algorithm is radial type, in SVM
algorithm reduces the time for data analysis and enhances performance of intrusion detection.
The premise of this paper is to discover frequent patterns by the use of data grids in WEKA 3.8 environment. Workload imbalance occurs due to the dynamic nature of the grid computing hence data grids are used for the creation and validation of data. Association rules are used to extract the useful information from the large database. In this paper the researcher generate the best rules by using WEKA 3.8 for better performance. WEKA 3.8 is used to accomplish best rules and implementation of various algorithms.
Construction of a compact FP-tree ensures that subsequent mining can be performed with a rather compact data structure. This does not automatically guarantee that it will be highly efficient since one may still encounter the combinatorial problem of candidate generation if one simply uses this FP-tree to generate and check all the candidate patterns. we study how to explore the compact information stored in an FP-tree, develop the principles of frequent-pattern growth by examination of our running example, explore how to perform further optimization when there exit a single prefix path in an FP-tree, and propose a frequent- pattern growth algorithm, FP-growth, for mining the complete set of frequent patterns using FP-tree.
SCCAI- A Student Career Counselling Artificial Intelligencevivatechijri
As education is growing day by day, the competition has prompted a need for the student to
understand more about the educational field. Many times the counselor isn’t available all the time and
sometimes due to the lack of proper knowledge about some educational field. Due to this, it creates an issue of
misconception of that field. This creates a problem for the student to decide a proper educational trajectory and
guidance is not always useful. The proposed paper will overcome all these problem using machine learning
algorithm. Various algorithms are being considered and amongst them the best suitable for our project are used
here. There are 3 major problems that come across our path and they are solved using Random forest, Linear
regression and Searching algorithm using Google API. At first Searching algorithm solves the problem of
location by segregating the college’s location vice, then Random Forest provides the list of colleges by using
stream and range of percentage and finally Linear Regression predicts the current cutoff using previous years’
data. Rather than this, the proposed system also provides information regarding all fields of education helping
students to understand and know about their field of interest better. The following idea is a total fresh idea with
no existing projects of similar kind. This project will help students guide them throughout.
● Alpha Power-Kumaraswamy Distribution with An Application on Survival Times of Cancer Patients
https://ojs.bilpublishing.com/index.php/jcsr/article/view/1855
● Energy-Efficient Transaction Serialization for IoT Devices
https://ojs.bilpublishing.com/index.php/jcsr/article/view/1620
● Mobile Software Assurance Informed through Knowledge Graph Construction: The OWASP Threat of Insecure Data Storage https://ojs.bilpublishing.com/index.php/jcsr/article/view/1765
● Neural Network Based Adaptation Algorithm for Online Prediction of Mechanical Properties of Steel
https://ojs.bilpublishing.com/index.php/jcsr/article/view/1966
● Research on Innovative Model Based on Solving Demand Pain Points in Software Development
https://ojs.bilpublishing.com/index.php/jcsr/article/view/1957
Fuzzy C-means clustering on rainfall flow optimization technique for medical ...IAESIJAI
Due to various killing diseases in the world, medical data clustering is a very
challenging and critical task to handle and to take the proper decision from
multidimensional complex data in an effective manner. The most familiar
and suitable speedy clustering algorithm is K-means than other traditional
clustering approaches. But K-means is extra sensitive for initialization of
clustering centroid and it can easily surround. Thus, there is a necessity for
faster clustering with an effective optimum clustering centroid. Based on
that, this research paper projected an optimization-based clustering by hybrid
fuzzy C-means (FCM) clustering on rainfall flow optimization technique
(RFFO), which is the normal flow and behavior of rainfall flow from one
position to another position. FCM clustering algorithm is used to cluster the
given medical data and RFFO is used to produce optimum clustering
centroid. Finally, the clustering performance is also measured for the
proposed FCM clustering on RFFO technique with the help of accuracy,
random coefficient, and Jaccard coefficient for medical data set and find the
risk factor of a heart attack.
Comparative study of frequent item set in data miningijpla
In this paper, we are an overview of already presents frequent item set mining algorithms. In these days
frequent item set mining algorithm is very popular but in the frequent item set mining computationally
expensive task. Here we described different process which use for item set mining, We also compare
different concept and algorithm which used for generation of frequent item set mining From the all the
types of frequent item set mining algorithms that have been developed we will compare important ones. We
will compare the algorithms and analyze their run time performance.
Efficient Database Management System For Wireless Sensor Network Onyebuchi nosiri
An effective database management system has been put forward in this work to tackle the problem in remote monitoring using Wireless Sensor Network Object Oriented Analysis and Design method employed as classes was evolved to create objects in the employed program used. An algorithm was developed with a corresponding flowchart to realize the design, the work also came up with a dynamic graph plotter, as this offers an adaptive monitoring facility for data stored in the Database. Sensor Node query was implemented and result of transmitted data was filtered for a particular node
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
Similar to AGENT ENABLED MINING OF DISTRIBUTED PROTEIN DATA BANKS (20)
ENHANCING ENGLISH WRITING SKILLS THROUGH INTERNET-PLUS TOOLS IN THE PERSPECTI...ijfcstjournal
This investigation delves into incorporating a hybridized memetic strategy within the framework of English
composition pedagogy, leveraging Internet Plus resources. The study aims to provide an in-depth analysis
of how this method influences students’ writing competence, their perceptions of writing, and their
enthusiasm for English acquisition. Employing an explanatory research design that combines qualitative
and quantitative methods, the study collects data through surveys, interviews, and observations of students’
writing performance before and after the intervention. Findings demonstrate a beneficial impact of
integrating the memetic approach alongside Internet Plus tools on the writing aptitude of English as a
Foreign Language (EFL) learners. Students reported increased engagement with writing, attributing it to
the use of Internet plus tools. They also expressed that the memetic approach facilitated a deeper
understanding of cultural and social contexts in writing. Furthermore, the findings highlight a significant
improvement in students’ writing skills following the intervention. This study provides significant insights
into the practical implementation of the memetic approach within English writing education, highlighting
the beneficial contribution of Internet Plus tools in enriching students' learning journeys.
A SURVEY TO REAL-TIME MESSAGE-ROUTING NETWORK SYSTEM WITH KLA MODELLINGijfcstjournal
Messages routing over a network is one of the most fundamental concept in communication which requires
simultaneous transmission of messages from a source to a destination. In terms of Real-Time Routing, it
refers to the addition of a timing constraint in which messages should be received within a specified time
delay. This study involves Scheduling, Algorithm Design and Graph Theory which are essential parts of
the Computer Science (CS) discipline. Our goal is to investigate an innovative and efficient way to present
these concepts in the context of CS Education. In this paper, we will explore the fundamental modelling of
routing real-time messages on networks. We study whether it is possible to have an optimal on-line
algorithm for the Arbitrary Directed Graph network topology. In addition, we will examine the message
routing’s algorithmic complexity by breaking down the complex mathematical proofs into concrete, visual
examples. Next, we explore the Unidirectional Ring topology in finding the transmission’s
“makespan”.Lastly, we propose the same network modelling through the technique of Kinesthetic Learning
Activity (KLA). We will analyse the data collected and present the results in a case study to evaluate the
effectiveness of the KLA approach compared to the traditional teaching method.
A COMPARATIVE ANALYSIS ON SOFTWARE ARCHITECTURE STYLESijfcstjournal
Software architecture is the structural solution that achieves the overall technical and operational
requirements for software developments. Software engineers applied software architectures for their
software system developments; however, they worry the basic benchmarks in order to select software
architecture styles, possible components, integration methods (connectors) and the exact application of
each style.
The objective of this research work was a comparative analysis of software architecture styles by its
weakness and benefits in order to select by the programmer during their design time. Finally, in this study,
the researcher has been identified architectural styles, weakness, and Strength and application areas with
its component, connector and Interface for the selected architectural styles.
SYSTEM ANALYSIS AND DESIGN FOR A BUSINESS DEVELOPMENT MANAGEMENT SYSTEM BASED...ijfcstjournal
A design of a sales system for professional services requires a comprehensive understanding of the
dynamics of sale cycles and how key knowledge for completing sales is managed. This research describes
a design model of a business development (sales) system for professional service firms based on the Saudi
Arabian commercial market, which takes into account the new advances in technology while preserving
unique or cultural practices that are an important part of the Saudi Arabian commercial market. The
design model has combined a number of key technologies, such as cloud computing and mobility, as an
integral part of the proposed system. An adaptive development process has also been used in implementing
the proposed design model.
AN ALGORITHM FOR SOLVING LINEAR OPTIMIZATION PROBLEMS SUBJECTED TO THE INTERS...ijfcstjournal
Frank t-norms are parametric family of continuous Archimedean t-norms whose members are also strict
functions. Very often, this family of t-norms is also called the family of fundamental t-norms because of the
role it plays in several applications. In this paper, optimization of a linear objective function with fuzzy
relational inequality constraints is investigated. The feasible region is formed as the intersection of two
inequality fuzzy systems defined by frank family of t-norms is considered as fuzzy composition. First, the
resolution of the feasible solutions set is studied where the two fuzzy inequality systems are defined with
max-Frank composition. Second, some related basic and theoretical properties are derived. Then, a
necessary and sufficient condition and three other necessary conditions are presented to conceptualize the
feasibility of the problem. Subsequently, it is shown that a lower bound is always attainable for the optimal
objective value. Also, it is proved that the optimal solution of the problem is always resulted from the
unique maximum solution and a minimal solution of the feasible region. Finally, an algorithm is presented
to solve the problem and an example is described to illustrate the algorithm. Additionally, a method is
proposed to generate random feasible max-Frank fuzzy relational inequalities. By this method, we can
easily generate a feasible test problem and employ our algorithm to it.
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...ijfcstjournal
Underwater detector network is one amongst the foremost difficult and fascinating analysis arenas that
open the door of pleasing plenty of researchers during this field of study. In several under water based
sensor applications, nodes are square measured and through this the energy is affected. Thus, the mobility
of each sensor nodes are measured through the water atmosphere from the water flow for sensor based
protocol formations. Researchers have developed many routing protocols. However, those lost their charm
with the time. This can be the demand of the age to supply associate degree upon energy-efficient and
ascendable strong routing protocol for under water actuator networks. During this work, the authors tend
to propose a customary routing protocol named level primarily based routing protocol (LBRP), reaching to
offer strong, ascendable and energy economical routing. LBRP conjointly guarantees the most effective use
of total energy consumption and ensures packet transmission which redirects as an additional reliability in
compare to different routing protocols. In this work, the authors have used the level of forwarding node,
residual energy and distance from the forwarding node to the causing node as a proof in multicasting
technique comparisons. Throughout this work, the authors have got a recognition result concerning about
86.35% on the average in node multicasting performances. Simulation has been experienced each in a
wheezy and quiet atmosphere which represents the endorsement of higher performance for the planned
protocol.
STRUCTURAL DYNAMICS AND EVOLUTION OF CAPSULE ENDOSCOPY (PILL CAMERA) TECHNOLO...ijfcstjournal
This research paper examined and re-evaluates the technological innovation, theory, structural dynamics
and evolution of Pill Camera(Capsule Endoscopy) technology in redirecting the response manner of small
bowel (intestine) examination in human. The Pill Camera (Endoscopy Capsule) is made up of sealed
biocompatible material to withstand acid, enzymes and other antibody chemicals in the stomach is a
technology that helps the medical practitioners especially the general physicians and the
gastroenterologists to examine and re-examine the intestine for possible bleeding or infection. Before the
advent of the Pill camera (Endoscopy Capsule) the colonoscopy was the local method used but research
showed that some parts (bowel) of the intestine can’t be reach by mere traditional method hence the need
for Pill Camera. Countless number of deaths from stomach disease such as polyps, inflammatory bowel
(Crohn”s diseases), Cancers, Ulcer, anaemia and tumours of small intestines which ordinary would have
been detected by sophisticated technology like Pill Camera has become norm in the developing nations.
Nevertheless, not only will this paper examine and re-evaluate the Pill Camera Innovation, theory,
Structural dynamics and evolution it unravelled and aimed to create awareness for both medical
practitioners and the public.
AN OPTIMIZED HYBRID APPROACH FOR PATH FINDINGijfcstjournal
Path finding algorithm addresses problem of finding shortest path from source to destination avoiding
obstacles. There exist various search algorithms namely A*, Dijkstra's and ant colony optimization. Unlike
most path finding algorithms which require destination co-ordinates to compute path, the proposed
algorithm comprises of a new method which finds path using backtracking without requiring destination
co-ordinates. Moreover, in existing path finding algorithm, the number of iterations required to find path is
large. Hence, to overcome this, an algorithm is proposed which reduces number of iterations required to
traverse the path. The proposed algorithm is hybrid of backtracking and a new technique(modified 8-
neighbor approach). The proposed algorithm can become essential part in location based, network, gaming
applications. grid traversal, navigation, gaming applications, mobile robot and Artificial Intelligence.
EAGRO CROP MARKETING FOR FARMING COMMUNITYijfcstjournal
The Major Occupation in India is the Agriculture; the people involved in the Agriculture belong to the poor
class and category. The people of the farming community are unaware of the new techniques and Agromachines, which would direct the world to greater heights in the field of agriculture. Though the farmers
work hard, they are cheated by agents in today’s market. This serves as a opportunity to solve
all the problems that farmers face in the current world. The eAgro crop marketing will serve as a better
way for the farmers to sell their products within the country with some mediocre knowledge about using
the website. This would provide information to the farmers about current market rate of agro-products,
their sale history and profits earned in a sale. This site will also help the farmers to know about the market
information and to view agricultural schemes of the Government provided to farmers.
EDGE-TENACITY IN CYCLES AND COMPLETE GRAPHSijfcstjournal
It is well known that the tenacity is a proper measure for studying vulnerability and reliability in graphs.
Here, a modified edge-tenacity of a graph is introduced based on the classical definition of tenacity.
Properties and bounds for this measure are introduced; meanwhile edge-tenacity is calculated for cycle
graphs and also for complete graphs.
COMPARATIVE STUDY OF DIFFERENT ALGORITHMS TO SOLVE N QUEENS PROBLEMijfcstjournal
This Paper provides a brief description of the Genetic Algorithm (GA), the Simulated Annealing (SA)
Algorithm, the Backtracking (BT) Algorithm and the Brute Force (BF) Search Algorithm and attempts to
explain the way as how the Proposed Genetic Algorithm (GA), the Proposed Simulated Annealing (SA)
Algorithm using GA, the Backtracking (BT) Algorithm and the Brute Force (BF) Search Algorithm can be
employed in finding the best solution of N Queens Problem and also, makes a comparison between these
four algorithms. It is entirely a review based work. The four algorithms were written as well as
implemented. From the Results, it was found that, the Proposed Genetic Algorithm (GA) performed better
than the Proposed Simulated Annealing (SA) Algorithm using GA, the Backtracking (BT) Algorithm and
the Brute Force (BF) Search Algorithm and it also provided better fitness value (solution) than the
Proposed Simulated Annealing Algorithm (SA) using GA, the Backtracking (BT) Algorithm and the Brute
Force (BF) Search Algorithm, for different N values. Also, it was noticed that, the Proposed GA took more
time to provide result than the Proposed SA using GA.
PSTECEQL: A NOVEL EVENT QUERY LANGUAGE FOR VANET’S UNCERTAIN EVENT STREAMSijfcstjournal
In recent years, the complex event processing technology has been used to process the VANET’s temporal
and spatial event streams. However, we usually cannot get the accurate data because the device sensing
accuracy limitations of the system. We only can get the uncertain data from the complex and limited
environment of the VANET. Because the VANET’s event streams are consist of the uncertain data, so they
are also uncertain. How effective to express and process these uncertain event streams has become the core
issue for the VANET system. To solve this problem, we propose a novel complex event query language
PSTeCEQL (probabilistic spatio-temporal constraint event query language). Firstly, we give the definition
of the possible world model of VANET’s uncertain event streams. Secondly, we propose an event query
language PSTeCEQL and give the syntax and the operational semantics of the language. Finally, we
illustrate the validity of the PSTeCEQL by an example.
CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON...ijfcstjournal
Now a day enormous amount of data is getting explored through Internet of Things (IoT) as technologies
are advancing and people uses these technologies in day to day activities, this data is termed as Big Data
having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose
frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by
traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets
but it has large communication cost which reduces execution efficiency. This proposed new pre-processed
k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using kmeans algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets
from generated clusters using MapReduce programming model. Results shown that execution efficiency of
ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as
one of the pre-processing technique.
A MUTATION TESTING ANALYSIS AND REGRESSION TESTINGijfcstjournal
Software testing is a testing which conducted a test to provide information to client about the quality of the
product under test. Software testing can also provide an objective, independent view of the software to
allow the business to appreciate and understand the risks of software implementation. In this paper we
focused on two main software testing –mutation testing and mutation testing. Mutation testing is a
procedural testing method, i.e. we use the structure of the code to guide the test program, A mutation is a
little change in a program. Such changes are applied to model low level defects that obtain in the process
of coding systems. Ideally mutations should model low-level defect creation. Mutation testing is a process
of testing in which code is modified then mutated code is tested against test suites. The mutations used in
source code are planned to include in common programming errors. A good unit test typically detects the
program mutations and fails automatically. Mutation testing is used on many different platforms, including
Java, C++, C# and Ruby. Regression testing is a type of software testing that seeks to uncover
new software bugs, or regressions, in existing functional and non-functional areas of a system after
changes such as enhancements, patches or configuration changes, have been made to them. When defects
are found during testing, the defect got fixed and that part of the software started working as needed. But
there may be a case that the defects that fixed have introduced or uncovered a different defect in the
software. The way to detect these unexpected bugs and to fix them used regression testing. The main focus
of regression testing is to verify that changes in the software or program have not made any adverse side
effects and that the software still meets its need. Regression tests are done when there are any changes
made on software, because of modified functions.
GREEN WSN- OPTIMIZATION OF ENERGY USE THROUGH REDUCTION IN COMMUNICATION WORK...ijfcstjournal
Advances in micro fabrication and communication techniques have led to unimaginable proliferation of
WSN applications. Research is focussed on reduction of setup operational energy costs. Bulk of operational
energy costs are linked to communication activities of WSN. Any progress towards energy efficiency has a
potential of huge savings globally. Therefore, every energy efficient step is an endeavour to cut costs and
‘Go Green’. In this paper, we have proposed a framework to reduce communication workload through: Innetwork compression and multiple query synthesis at the base-station and modification of query syntax
through introduction of Static Variables. These approaches are general approaches which can be used in
any WSN irrespective of application.
A NEW MODEL FOR SOFTWARE COSTESTIMATION USING HARMONY SEARCHijfcstjournal
Accurate and realistic estimation is always considered to be a great challenge in software industry.
Software Cost Estimation (SCE) is the standard application used to manage software projects. Determining
the amount of estimation in the initial stages of the project depends on planning other activities of the
project. In fact, the estimation is confronted with a number of uncertainties and barriers’, yet assessing the
previous projects is essential to solve this problem. Several models have been developed for the analysis of
software projects. But the classical reference method is the COCOMO model, there are other methods
which are also applied such as Function Point (FP), Line of Code(LOC); meanwhile, the expert`s opinions
matter in this regard. In recent years, the growth and the combination of meta-heuristic algorithms with
high accuracy have brought about a great achievement in software engineering. Meta-heuristic algorithms
which can analyze data from multiple dimensions and identify the optimum solution between them are
analytical tools for the analysis of data. In this paper, we have used the Harmony Search (HS)algorithm for
SCE. The proposed model which is a collection of 60 standard projects from Dataset NASA60 has been
assessed.The experimental results show that HS algorithm is a good way for determining the weight
similarity measures factors of software effort, and reducing the error of MRE.
International Journal on Foundations of Computer Science & Technology (IJFCST)ijfcstjournal
International Journal on Foundations of Computer Science & Technology (IJFCST) is a Bi-monthly peer-reviewed and refereed open access journal that publishes articles which contribute new results in all areas of the Foundations of Computer Science & Technology. Over the last decade, there has been an explosion in the field of computer science to solve various problems from mathematics to engineering. This journal aims to provide a platform for exchanging ideas in new emerging trends that needs more focus and exposure and will attempt to publish proposals that strengthen our goals. Topics of interest include, but are not limited to the following:
Because the technology is used largely in the last decades; cybercrimes have become a significant
international issue as a result of the huge damage that it causes to the business and even to the ordinary
users of technology. The main aims of this paper is to shed light on digital crimes and gives overview about
what a person who is related to computer science has to know about this new type of crimes. The paper has
three sections: Introduction to Digital Crime which gives fundamental information about digital crimes,
Digital Crime Investigation which presents different investigation models and the third section is about
Cybercrime Law.
DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...ijfcstjournal
In this paper, we analyze the evolution of a small-world network and its subsequent transformation to a
random network using the idea of link rewiring under the well-known Watts-Strogatz model for complex
networks. Every link u-v in the regular network is considered for rewiring with a certain probability and if
chosen for rewiring, the link u-v is removed from the network and the node u is connected to a randomly
chosen node w (other than nodes u and v). Our objective in this paper is to analyze the distribution of the
maximal clique size per node by varying the probability of link rewiring and the degree per node (number
of links incident on a node) in the initial regular network. For a given probability of rewiring and initial
number of links per node, we observe the distribution of the maximal clique per node to follow a Poisson
distribution. We also observe the maximal clique size per node in the small-world network to be very close
to that of the average value and close to that of the maximal clique size in a regular network. There is no
appreciable decrease in the maximal clique size per node when the network transforms from a regular
network to a small-world network. On the other hand, when the network transforms from a small-world
network to a random network, the average maximal clique size value decreases significantly
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMSijfcstjournal
This research paper is a statistical comparative study of a few average case asymptotically optimal sorting
algorithms namely, Quick sort, Heap sort and K- sort. The three sorting algorithms all with the same
average case complexity have been compared by obtaining the corresponding statistical bounds while
subjecting these procedures over the randomly generated data from some standard discrete and continuous
probability distributions such as Binomial distribution, Uniform discrete and continuous distribution and
Poisson distribution. The statistical analysis is well supplemented by the parameterized complexity
analysis
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
AGENT ENABLED MINING OF DISTRIBUTED PROTEIN DATA BANKS
1. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
DOI:10.5121/ijfcst.2015.5303 25
AGENT ENABLED MINING OF DISTRIBUTED
PROTEIN DATA BANKS
G. S. Bhamra1
, A. K. Verma2
and R. B. Patel3
1
M. M. University, Mullana, Haryana, 133207 - India
2
Thapar University, Patiala, Punjab, 147004- India
3
Chandigarh College of Engineering & Technology, Chandigarh- 160019- India
ABSTRACT
Mining biological data is an emergent area at the intersection between bioinformatics and data mining
(DM). The intelligent agent based model is a popular approach in constructing Distributed Data Mining
(DDM) systems to address scalable mining over large scale distributed data. The nature of associations
between different amino acids in proteins has also been a subject of great anxiety. There is a strong need to
develop new models and exploit and analyze the available distributed biological data sources. In this study,
we have designed and implemented a multi-agent system (MAS) called Agent enriched Quantitative
Association Rules Mining for Amino Acids in distributed Protein Data Banks (AeQARM-AAPDB). Such
globally strong association rules enhance understanding of protein composition and are desirable for
synthesis of artificial proteins. A real protein data bank is used to validate the system.
KEYWORDS
Knowledge Discovery, Association Rules, Intelligent Agents, Multi-Agent System, Bioinformatics.
1. INTRODUCTION
Data Mining (DM) is a process to automatically extract some interesting and valid data patterns
or trends representing knowledge, implicitly stored in large databases [1], [2]. Distributed Data
Mining (DDM) is concerned with application of classical DM procedures in a distributed
computing environment trying to make the best of the available resources. In DDM, DM takes
place both locally at each geographically distributed site and at a global level where the local
knowledge is merged in order to discover global knowledge. A DDM system is a very complex
entity that is comprised of many components; mining algorithms; communication subsystems;
resources management; task scheduling; user interface etc. It should provide efficient access to
both distributed data and computing resources; monitor the entire mining procedure; and present
results to users in appropriate formats. A successful DDM system is also flexible enough to adapt
to various situations [3], [4], [5], [6], [7].
Intelligent software agent technology is an interdisciplinary technology. The motivating idea of
this technology is the development and efficient utilization of autonomous software objects called
agents, which have access to geographically distributed and heterogeneous information resources
to simplify the complexities of distributed computing. They are autonomous, adaptive, reactive,
pro-active, social, cooperative, collaborative and flexible. They also support temporal continuity
and mobility within the network. An intelligent agent with mobility feature is known as Mobile
Agent (MA). MA migrates from node to node in a heterogeneous network without losing its
operability. It can continue to function even if the user is disconnected from the network. On
reaching at a node MA is delivered to an Agent Execution Environment (AEE) where its
executable parts are started running. Upon completion of the desired task, it delivers the results to
2. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
26
the home node. With MA, a single serialized object is transmitted over the network carrying the
small amount of resultant data only thus reducing the consumption of network bandwidth, latency
(response time delay) and network traffic. An AEE or Mobile Agent Platform (MAP), is server
application that provides the appropriate functionality to MAs to authenticate, execute,
communicate (with other agents, users, and other platforms), migrate to other platform, and use
system resources in a secure way. A Multi Agent System (MAS) is distributed application
comprised of multiple interacting intelligent agent components [8].
Table 1. Single Letter codes for Amino Acids
Sr.
No.
Amino Acid Name Single
Letter
Code
Three
Letter
Code
1 Alanine A Ala
2 Cysteine C Cys
3 Aspartic Acid D Asp
4 Glutamic Acid E Glu
5 Phenylalanine F Phe
6 Glycine G Gly
7 Histidine H His
8 Isoleucine I Ile
9 Lysine K Lys
10 Leucine L Leu
11 Methionine M Met
12 Asparagine N Asn
13 Proline P Pro
14 Glutamine Q Gln
15 Arginine R Arg
16 Serine S Ser
17 Threonine T Thr
18 Valine V Val
19 Tryptophan W Trp
20 Tyrosine Y Tyr
Bioinformatics or computational molecular biology aims at automated analysis and the
management of high-throughput biological data as well as modelling and simulation of complex
biological systems. Bioinformatics has very much changed since the first sequence alignment
algorithms in 1970s [9]. Today in-silico analysis is a fundamental component of biomedical
research. Bioinformatics has now encompasses a wide range of subject areas from structural
biology, genomics to gene expression studies [10], [11]. The post-genomic era has resulted into
availability of the enormous amount of distributed biological data sets that require suitable tools
and methods for modelling and analyzing biological processes and sequences. The bioinformatics
research community feels a strong need to develop new models and exploit and analyze the
available genomes [12]. The protein sequences are made up of 20 types of Amino Acids (AA).
Each AA is represented by a single letter alphabet as shown in Table 1. Unique 3-dimensional
structure of each protein is decided completely by its amino-acid sequence. Proteins are important
constituents of cellular machinery of any organism and the functioning of proteins heavily
depends upon its AA sequence. A slight change in the sequence might completely change the
functioning of the protein. Therefore, the nature of associations between different amino acids has
been a subject of great anxiety. In [20], authors mine the association rules among amino acids in
protein sequences. As per the literature available, no researcher has applied the agent technology
to mine association rules for amino acids from distributed protein data sets. The amalgamation of
3. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
27
the DDM and MAS provides rewarding solution in terms of security, scalability, storage cost,
computation cost and communication cost. Mining biological data is an emerging area and
continues to be an extremely important problem, both for DM and for biological sciences [21].
State of the art in use of agent technology in bioinformatics is reviewed in [22].
The rest of the paper is organised as follows. Section 2 discusses the distributed association rule
mining and preliminary notations used in this paper. A running environment for the proposed
system is present in Section 3 along with algorithms used for various components and protein
data banks used in this study. The layout and working of various agents involved in the proposed
system and their algorithms are also discussed. System is implemented in Java and its
performance is studied in Section 4 and finally the article is concluded in Section 5.
2. DISTRIBUTED ASSOCIATION RULE MINING
Let { }
, 1
j
DB T j D
= = K be a transactional dataset of size D where each transaction T is assigned
an identifier (TID ) and { }
,i 1
i
I d m
= = K , total m data items in DB . A set of items in a particular
transaction T is called itemset or pattern. An itemset, { }
,i 1
i
P d k
= = K , which is a set of k data
items in a particular transaction T and P I
⊆ , is called k-itemset. Support of an itemset,
( )
No_of_T_containing_P
%
s P
D
= is the frequency of occurrence of itemset P in DB , where
No_of_T_containing_P is the support count (sup_count) of itemset P . Frequent Itemsets (FIs)
are the itemset that appear in DB frequently, i.e., if ( ) min_th_sup
s P ≥ (given minimum
threshold support), then P is a frequent k-itemset. Finding such FIs plays an essential role in
miming the interesting relationships among itemsets. Frequent Itemset Mining (FIM) is the task
of finding the set of all the subsets of FIs in a transactional database. It is CPU and input/output
intensive task, mainly because of the large size of the datasets involved [2].
Association Rules (ARs) first introduced in [13], are used to discover the associations (or co-
occurrences) among item in a database. AR is an implication of the form
[ ]
support,confidence
P Q
⇒ where, ,
P I Q I
⊂ ⊂ and P and Q are disjoint itemsets, i.e.,
P Q
∩ = ∅ . An AR is measured in terms of its support ( s ) and confidence ( c ) factors. An AR
P Q
⇒ is said to be strong if ( ) min_th_sup
s P Q
⇒ ≥ (given minimum threshold support) and
( ) min_th_conf
c P ≥ (given minimum threshold confidence). Association Rule Mining (ARM)
today is one of the most important aspects of DM tasks. In ARM all the strong ARs are generated
from the FIs. The ARM can be viewed as two step process [14], [15].
1. Find all the frequent k-itemsets( k
L )
2. Generate Strong ARs from k
L
a. For each frequent itemset, k
l L
∈ , generate all non empty subsets of l .
b. For every non empty subset s of l , output the rule “ ( )
s l s
⇒ − ”, if
( )
( )
sup_count
min_th_conf
sup_count
l
s
≥
Distributed Association Rule Mining (DARM) generates the globally strong association rules
from the global FIs in a distributed environment. Because of an intrinsic data skew property of the
distributed database, it is desirable to mine the global rules for the global business decisions and
4. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
28
the local rules for the local business decisions. Comparative analysis of existing agent based
DARM systems can be found in [23].
Few preliminaries notations and definitions required for defining DARM and to make this study
self contained are as follows:
• { }
,i 1
i
S S n
= = K , ndistributed sites.
• CENTRAL
S , Central Site.
• {PR ,m 1 X }
i m i
PDB = = K , Protein Data Bank of i
X protein records at site i
S , each
m
PR has two main parts; first part is the description headers ( PD ) for structural
classification of protein and the second part contains protein sequence ( PS ), i.e.,
the chain of amino acids. Snapshot of 1
PDB is shown in Appendix A.1.
• {PR ,k 1 }
i k i
FPDB D
= = K , Filtered Protein Data Bank of i
D protein records at site i
S ,
where length of PS in each PR is in the range 50 and <400
≥ . Snapshot of 1
FPDB is
shown in Appendix A.2.
• i
AAF , Data bank of amino acids frequency for each PR FPDB
∈ at site i
S . 1
AAF is
shown in Appendix A.3. 2
AAF and 3
AAF are not shown due to space constraint.
• i
BDB , Boolean Data Bank that contains a value ‘1’ if the frequency of an amino acid lies
in the specific range and ‘0’ otherwise at site i
S . Snapshot of 1
BDB is shown in
Appendix A.5.
• i
IDB , Itemset Data Bank of frequency partition items to map boolean value ‘1’ with its
frequency partition item number at site i
S . Snapshot of 1
IDB is shown in Appendix A.6.
• ( )
FI
k i
L , Local frequent k-itemsets at site i
S .
• ( )
FISC
k i
L , List of support count ( )
FI
k i
Itemset L
∀ ∈ .
• LSAR
i
L , List of locally strong association rules at site i
S .
•
1
n
TLSAR LSAR
i
i
L L
=
= U , List of total locally strong association rules.
• ( )
1
n
TFI FI
k k i
i
L L
=
= U , List of total frequent k-itemsets.
• ( )
1
n
GFI FI
k k i
i
L L
=
= I , List of global frequent k-itemsets.
• GSAR
CENTRAL
L , List of Globally strong association rule.
Local Knowledge Base (LKB), at site i
S , comprises of ( )
FI
k i
L , ( )
FISC
k i
L and LSAR
i
L which can provide
reference to the local supervisor for local decisions. Global Knowledge Base (GKB), at CENTRAL
S ,
comprises of TLSAR
L , TFI
k
L , GFI
k
L and GSAR
CENTRAL
L for the global decision making. Like ARM, DARM
task can also be viewed as two-step process [15]:
1. Find the global frequent k-itemset ( GFI
k
L ) from the distributed Local frequent k-itemsets
( ( )
FI
k i
L ) from the partitioned datasets.
2. Generate globally strong association rules ( GSAR
CENTRAL
L ) from GFI
k
L .
5. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
29
3. PROPOSED AEQARM-AAPDB SYSTEM
3.1. Environment for the proposed system
Every MAS needs an underlying AEE to provide running infrastructure on which agents can be
deployed and tested. A running environment has been designed in Java to execute all the DM
agents involved in the proposed system. Various attributes of the MA are encapsulated within a
data structure known as AgentProfile . AgentProfile contains the name of MA ( AgentName ),
version number ( AgentVersion ), entire byte code ( BC ), list of nodes to be visited by MA, i.e.,
itinerary plan ( NODES
L ) , type of the itinerary ( ItinType ) which can be serial or parallel, a
reference of current execution state ( AObject ) and an additional data structure known as
Briefcase that acts as a result bag of MA to store final resultant knowledge ( i
Result_S ) at a
particular site. Computational time (CPUTime ) taken by a MA at a particular site is also stored
in i
Result_S . In addition to results, Briefcase also contains the system time for start of agent
journey ( start
TripTime ), system time for end of journey ( end
TripTime ) and total round trip time of
MA (TripTime ) calculated using the formula end start
TripTime TripTime TripTime
← − . This
environment consists of the following three components:
• Data Mining Agent Execution Environment (DM_AEE): It is the key component that
acts as a Server. DM_AEE is deployed on any distributed sites i
S and is responsible for
receiving, executing and migrating all the visiting data mining (DM) agents. It receives
the incoming AgentProfile at site i
S , retrieves the entire BC of agent and save it with
.
AgentName class in the local file system of the site i
S after that execution of the agent is
started using AObject . Steps are shown in Algorithm 1.
• Agent Launcher (AL): It acts a Client at agent launching station ( CENTRAL
S ) and launches
the goal oriented DM agents on behalf of the user through a user interface to the
DM_AEE running at the distributed sites. Agent Pool (or Zone) at CENTRAL
S is a repository
of all mobile as well as stationary agents (SAs). AL first reads and stores AgentName in
AgentProfile . The entire BC of the AgentName is loaded from the Agent Pool and
stored in AgentProfile . NODES
L and ItinType are retrieved and stored in AgentProfile .
start
TripTime is maintained in Briefcase which is further added to AgentProfile . In case
of parallel computing model, i.e., if ItinType= Parallel AL creates NODES
L number of
clones of the specific MA and dispatches each clone in parallel to all the sites listed in
NODES
L . Here NODES
L number of threads are created to dispatch the MAs in parallel. Each
clone has AgentVersion starting from 1 to size of NODES
L , which is used to identify each
clone on the network. Before dispatching the clone of a MA to DM_AEE, the current
state of the newly created ith clone object ( [ ]
AObject i ) is also stored in AgentProfile .
AL also contacts the Result Manager (RM) for processing the Briefcase of an agent.
Detailed steps are given in Algorithm 2.
• Result Manager (RM): It manages and processes the Briefcase of all MAs. RM is either
contacted by a MA for submitting its results or by AL for processing the results of the
specific MA. On completion of itinerary, each DM agent submits its results to RM which
computes total round trip time (TripTime) of that MA and saves it in the Briefcase of
that agent. If ItinType Parallel
= then it saves the AgentProfile of all the clones of the
agent with AgentName in a collection AllProfiles
AgentName
L . It is assumed that all the clones report
6. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
30
their results to RM. RM may be equipped with the feature of non reporting clones by
issuing an alert to AL for that clone with specific AgentVersion . AL then launch a new
clone for the specific AgentVersion for the specific site. When it is contacted by AL for
processing the results of a specific agent it sends back a collection AllProfiles
AgentName
L for all the
clones of that agent. Steps are defined in Algorithm 3.
Algorithm 1 DATA MINING AGENT EXECUTION ENVIRONMENT (DM_AEE)
1: procedure DM_AEE( )
2: while TRUE do
3: i
AgentPofile listen and receive AgentProfile at S
←
4: AgentName get AgentName from AgentProfile
←
5: BC retrieve the BC of agent from AgentProfile
←
6: i
save the BC with AgentName.class in the local file system of S
7: AObject get AObject from AgentProfile
← >current state
8: . ()
AObject run >start executing mobile agent
9: end while
10: end procedure
Algorithm 2 AGENT LAUNCHER (AL) FOR AEQARM-AAPDB
1: procedure AL( )
2: option read option(dispatch / result)
←
3: switch option do
4: case dispatch >dispatch the mobile agent to DM_AEE
5: AgentName read Mobile Agent's name
←
6: BC load entire byte code of AgentName from AgentPool
←
7: add AgentName and BC to AgentProfile
8: NODES
L read Itinerary (IP addresses)of mobile agent
←
9: ItinType read ItinType( Serial / Parallel)
←
10: add ItinType to AgentProfile
11: if "Parallel"
ItinType = then >Parallel Itinerary
12: NODES
AObject[L .size] >Array of Agent Objects for clone references
13: start
TripTime get system time for start of the agent journey
←
14: start
add TripTime to Briefcase
15: add Breifcase to AgentProfile
16: switch AgentName do
17: case PDBFA
18: for 1, .
NODES
i L size
← do >for each node in the itinerary
19: AgentVersion i
←
20: add AgentVersion to AgentProfile
21: .get(i)
NODES
NodeAddress L
← >get an IP address
22: [i]
AObject new PDBFA(AgentProfile)
←
23: Add AObject[i] to AgentProfile >clone’s state
24: Transfer AgentProfile to DM_AEE at NodeAddress
25: end for
7. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
31
26: end case
27: case AAFFA
28: for 1, .
NODES
i L size
← do >for each node in the itinerary
29: AgentVersion i
←
30: add AgentVersion to AgentProfile
31: .get(i)
NODES
NodeAddress L
← >get an IP address
32: [i]
AObject new AAFFA(AgentProfile)
←
33: Add AObject[i] to AgentProfile >clone’s state
34: Transfer AgentProfile to DM_AEE at NodeAddress
35: end for
36: end case
37: case FMIDBGA
38: max_freq read maximum frequency range for amino acids
←
39: for 1, .
NODES
i L size
← do >for each node in the itinerary
40: AgentVersion i
←
41: add AgentVersion to AgentProfile
42: .get(i)
NODES
NodeAddress L
← >get an IP address
43: [i] ,
AObject new FMIDBGA(AgentProfile max_freq)
←
44: Add AObject[i] to AgentProfile >clone’s state
45: Transfer AgentProfile to DM_AEE at NodeAddress
46: end for
47: end case
48: case LKGA_P
49: min_s read minimum threshold support
←
50: min_c read minimum threshold confidence
←
51: for 1, .
NODES
i L size
← do >for each node in the itinerary
52: AgentVersion i
←
53: add AgentVersion to AgentProfile
54: .get(i)
NODES
NodeAddress L
← >get an IP address
55: [i]
AObject new LKGA_P(AgentProfile,min_s,min_c)
←
56: Add AObject[i] to AgentProfile >clone’s state
57: Transfer AgentProfile to DM_AEE at NodeAddress
58: end for
59: end case
60: case LKCA_P
61: for 1, .
NODES
i L size
← do >for each node in the itinerary
62: AgentVersion i
←
63: add AgentVersion to AgentProfile
64: .get(i)
NODES
NodeAddress L
← >get an IP address
65: [i]
AObject new LKCA_P(AgentProfile)
←
66: Add AObject[i] to AgentProfile >clone’s state
67: Transfer AgentProfile to DM_AEE at NodeAddress
68: end for
69: end case
70: case GKDA_P
71: GSAR CENTRAL
CENTRAL CANTRAL CENTRAL
L load L generated by GKGA at S
←
8. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
32
72: GSAR
CENTRAL
add L to Briefcase
73: add updated Briefcase to AgentProfile
74: for 1, .
NODES
i L size
← do >for each node in the itinerary
75: AgentVersion i
←
76: add AgentVersion to AgentProfile
77: .get(i)
NODES
NodeAddress L
← >get an IP address
78: [i]
AObject newGKDA_P(AgentProfile)
←
79: Add AObject[i] to AgentProfile >clone’s state
80: Transfer AgentProfile to DM_AEE at NodeAddress
81: end for
82: end case
83: end switch
84: end if
85: end case
86: case result >process the results of mobile agent
87: AgentName read mobile agent's name
←
88: ItinType read mobile agent's itinerary type
←
89: AgentInfo
add AgentName and ItinType to L
90: if " "
ItinType Parallel
= then
91: Pr
All ofile AgentInfo
AgentName
L contact RM for L
←
92: switch AgentName do
93: case PDBFA
94: for all AllProfile
AgentName
AgentProfile L
∈ do >for each clone
95: Briefcase retrieve Briefcase from AgentProfile
←
96: process the Briefcase of PDBFA clone
97: end for
98: end case
99: case AAFFA
100: for all AllProfile
AgentName
AgentProfile L
∈ do >for each clone
101: Briefcase retrieve Briefcase from AgentProfile
←
102: process the Briefcase of AAFFA clone
103: end for
104: end case
105: case FMIDBGA
106: for all AllProfile
AgentName
AgentProfile L
∈ do >for each clone
107: Briefcase retrieve Briefcase from AgentProfile
←
108: process the Briefcase of FMIDBGA clone
109: end for
110: end case
111: case LKGA_P
112: for all AllProfile
AgentName
AgentProfile L
∈ do >for each clone
113: Briefcase retrieve Briefcase from AgentProfile
←
114: process the Briefcase of LKGA_P clone
115: end for
116: end case
117: case LKCA_P
118: AllProfile
AgentName
call RIGKGA(L ) >stationary agent
9. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
33
119: end case
120: case GKDA_P
121: for all AllProfile
AgentName
AgentProfile L
∈ do >for each clone
122: Briefcase retrieve Briefcase from AgentProfile
←
123: process the Briefcase of GKDA_P clone
124: end for
125: end case
126: end switch
127: end if
128: end case
129: end switch
130: end procedure
Algorithm 3 RESULT MANAGER(RM)
1: procedure RM( )
2: while TRUE do
3: listen and receive the incomming request
4: if i
contacted by a mobile agent for submitting results from site S then
5: i
AgentProfile receive the incomming AgentProfile from site S
←
6: ItinType retrieve ItinType from AgentProfile
←
7: Briefcase retrieve mobile agent's Briefcase from AgentProfile
←
8: start start
TripTime retrieveTripTime from Briefcase
←
9: end end
TripTime retrieveTripTime from Briefcase
←
10: end start
TripTime TripTime TripTime
← −
11: add TripTime to Briefcase
12: add updated Briefcase to AgentProfile
13: if "Parallel"
ItinType = then
14: AgentName retrieve AgentName from AgentProfile
←
15: AgentVersion retrieve AgentVersion from AgentProfile
←
16: if 1
AgentVersion = then
17: AllProfiles
AgentName
add AgentProfile to L
18: AllProfiles
AgentName CENTRAL
save L at S
19: end if
20: if 1
AgentVersion > then
21: AllProfiles
AgentName CENTRAL
retrieve L from S
22: AllProfiles
AgentName
add AgentProfile to L
23: AllProfiles
AgentName CENTRAL
save updated L at S
24: end if
25: end if
26: end if
27: if contacted by AgentLauncher for processing the results then
28: AgentName retrieve AgentName from AgentProfile
←
29: AgentInfo
ItinType retrieve ItinType from incomming L
←
30: if " "
ItinType Parallel
= then
31: AllProfiles AllProfiles
AgentName AgentName CENTRAL
L load L from S
←
10. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
34
32: AllProfiles
AgentName
dispatch L to AgentLauncher
33: end if
34: end if
35: end while
36: end procedure
Figure 1. AeQARM-AAPDB MAS
3.2. Protein Data Bank
AeQARM-AAPDB system is tested on a real dataset of protein sequences, the Astral SCOP [16],
[17 ] version 1.75 genetic domain sequence subsets, based on PDB SEQRES records with less
than 40% identity to each other [18]. There are total of 10569 Protein records in this dataset. This
single PDB is divided into 3 unites ( 1
PDB , 2
PDB and 3
PDB ) of 3523 protein records in each
and are stored at three distributed sites. Each i
PDB is further filtered to generate i
FPDB for PS
length range 50
≥ and 400
≤ . A total of only 9633 (3341 ( 1
FPDB ) + 3253 ( 2
FPDB ) + 3039
( 3
FPDB )) such filtered records are considered for the mining. The frequencies of 20 amino acids
in each protein sequence are retrieved and stored in i
AAF which is further mapped into i
BDB for
11. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
35
each amino acid having 15 frequency ranges (partitions) resulting into 300 amino acids
,
attribute value pairs as shown in Appendix A.4.
3.3. Layout and working of AeQARM-AAPDB system
AeQARM-AAPDB MAS is shown in Figure 1. This MAS consists of total seven agents, clones
of six MAs in serial number 1 to 6 are dispatched from CENTRAL
S with parallel itinerary migration
and one at serial number 7 is an intelligent stationary agent (SA) running at CENTRAL
S to perform
the different tasks. The CPU time taken by a MA while processing on each site along with some
other specific information is carried back in the result bag at CENTRAL
S . Relationship among these
agents and their working behaviour are given as follows.
1. Protein Data Bank Filtering Agent (PDBFA): Clones of this MA is dispatched in parallel
to each distributed site by AL. It carries the AgentProfile along with it and filters i
PDB to
generate i
FPDB at each site i
S . PDBFA carries back the computational time (CPUTime )
at each site i
S and end
TripTime .
2. Amino Acids Frequency Finder Agent (AAFFA): Every clone of this MA carries the
AgentProfile along with it and finds the frequencies of each amino acids in every protein
sequence record in i
FPDB to create i
AAF at each site i
S . AAFFA carries back the
computational time (CPUTime ) at each site i
S and end
TripTime .
3. Frequency Mapping and Itemset Data Bank Generater Agent (FMIDBGA): Every clone
of this MA carries the AgentProfile and maxfrq (the given maximum frequency range for
amino acids) along with it. It divides the frequencies of each amino acids in i
AAF into
intervals and maps the frequencies into Boolean values to create i
BDB for frequency
intervals and further maps it to i
IDB for frequency interval items.
4. Local Knowledge Generator Agent with Parallel Itinerary (LKGA_P): Every cloned
LKGA_P carries the AgentProfile , min_th_sup and min_th_conf along with it. This
agent is embedded with Apriori algorithm [19] for generating all the frequent k-itemset
lists. This agent may be equipped with decision making capability to select other FIM
algorithms based on the density of the dataset at a particular site. It first performs the FIM
to generate and store (i)
FI
k
L and (i)
FISC
k
L at site i
S by scanning the local i
IDB at that site with
the constraint of min_th_sup . It then performs the ARM applying the constraint of
min_th_conf to generate and store LSAR
i
L by using the (i)
FI
k
L and (i)
FISC
k
L . LSAR
i
L list also
contains support and confidence for a particular association rule along with site name. It
carries back the computational time (CPUTime ) at each site i
S and end
TripTime .
5. Local Knowledge Collector Agent with Parallel Itinerary (LKCA_P): Every cloned
LKGA_P carries the AgentProfile and collects the list of local frequent k-itemset ( ( )
FI
k i
L )
and list of locally strong association rules ( LSAR
i
L ) generated by LKGA_P. It carries back
these distributed results in the result bag to RM at CENTRAL
S where these results are
integrated with the help of a RIA stationary agent. In addition to this resultant knowledge it
also carries back the computational time (CPUTime ) at each site i
S and end
TripTime .
6. Global Knowledge Dispatcher Agent with Parallel Itinerary (GKDA_P): Every cloned
GKDA_P carries AgentProfile containing global knowledge ( GSAR
CENTRAL
L ) for further
12. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
36
decision making and comparing with local knowledge at that site. It carries back the
computational time (CPUTime ) at each site i
S and end
TripTime .
7. Result Integration and Global Knowledge Generater Agent (RIGKGA): It is a stationary
agent at CENTRAL
S , mainly used for processing the result bags of all clones of LKCA_P. It
creates a list of total frequent k-itemset ( TFI
k
L ), a list of global frequent itemset, GFI
k
L and a
list of total locally strong association rules ( TLSAR
L ). TFI
k
L and TLSAR
L are further processed to
generate and store GSAR
CENTRAL
L list.
Figure 2. Control Panel of AeQARM-AAPDB
Table 2. Network Configuration
Site Name Processor OS
LAN Configuration
IP a
Network
SCENTRAL Intel b
MS c
192.168.46.5 NW d
S1 Intel b
MS c
192.168.46.212 NW d
S2 Intel b
MS c
192.168.46.189 NW d
S3 Intel b
MS c
192.168.46.213 NW d
a. IP address with Mask: 255.255.255.0 and Gateway 192.168.46.1
b. Intel Pentium Dual Core(3.40 GHz, 3.40 GHz) with 512 MB RAM
c. Microsoft Windows XP Professional ver. 2002
d. Network Speed: 100 Mbps and Network Adaptor: 82566DM-2 Gigabit NIC
4. IMPLEMENTATION AND PERFORMANCE STUDY
All the agents as well as Control Panel of the system, as shown in Figure 2, are implemented in
Java. The required configuration for the deployment of the system is shown in Table 2 with
additional deployment of DM_AEE at each distributed site and AL and RM at CENTRAL
S .
13. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
37
Figure 7. Globally strong association rules for globally frequent 2-itemsets
Figure 8. Globally strong association rules for frequent 2- amino acids
(1)
FI
k
L and (1)
FISC
k
L at site 1
S generated by LKGA_P agent with 20% min_th_sup are shown in
Appendix B.1. Locally strong association rules ( 1
LSAR
L ) generated by LKGA_P for frequent item
numbers at site 1
S are shown in Appendix B.2 and the same for their corresponding amino acids
frequency range are shown in Appendix B.3. Globally strong association rules ( GSAR
CENTRAL
L ) for the
globally frequent itemsets generated by RIGKGA are shown in Figure 7. When these item
numbers are mapped with their corresponding amino acids frequency ranges then globally strong
14. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
38
quantitative association rules for frequent 2 amino acids are obtained and shown in Figure 8. The
results, as shown in Figure 8, reveals that:
• < Cysteine: 0..2 > is strongly associated with < Histidine: 3..5 >, < Methionine:0..2 >, <
Methionine: 3..5 >, < Tryptophan: 0..2 >, and < Tyrosine..3::5 >
• < Tryptophan: 0..2 > is strongly associated with < Histidine: 0..2 >, <Histidine: 3..5 >, <
Methionine: 0..2 >, < Methionine: 3..5 >, < Glutamine: 3..5 > and < Tyrosine: 3..5 >
5. CONCLUSION
Mobile agents strongly qualify for designing distributed applications. DDM, when clubbed with
the agent technology, makes a promising alliance that gives favourable results and provides a
rewarding solution in managing Big Data with ever increasing size. In this study, a MAS called
AeQARM-AAPDB to mine the strong quantitative association rules among amino acids present
in primary structure of the proteins from the distributed proteins data sets using intelligent agents
is presented. Such globally strong association rules are used in understanding of protein
composition and are desirable for synthesis of artificial proteins. Agent-based bio-data mining
leaves the technical details of choosing mining algorithms, forming hybrid system, and preparing
specific data format to the intelligent system itself because such requirements are unreasonable
for most biologists. It alleviates the technical difficulty while enhance the reusability of the
mining algorithms and available datasets. By applying multi-agent based distributed bio-data
mining, the computing load can be balanced and the computational effort can be achieved in a
parallel and scalable manner.
REFERENCES
[1] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth & R. Uthurusamy, (1996) Advances in Knowledge
Discovery and Data Mining, AAAI/MIT Press.
[2] J. Han & M. Kamber, (2006) Data Mining: Concepts and Techniques, 2nd ed. Morgan Kaufmann.
[3] B. -H. Park & H. Kargupta, (2002) “Distributed Data Mining: Algorithms, Systems, and
Applications,” Department of Computer Science and Electrical Engineering, University of Maryland
Baltimore County, 1000 Hilltop Circle Baltimore, MD 21250, Available:
http://www.csee.umbc.edu/_hillol/PUBS/review.pdf
[4] G. Tsoumakas & I. Vlahavas, (2009) “Distributed Data Mining”, Department of Informatics, Aristotle
University of Thessaloniki, Thessaloniki, Greece, Available:
http://talos.csd.auth.gr/tsoumakas/publications/E1.pdf
[5] H. Kargupta & P. Chan, (2000) Advances in Distributed and Parallel Knowledge Discovery.
AAAI/MIT Press.
[6] Y. Fu, (2001) “Distributed Data Mining: An Overview”, Department of Computer Science, University
of Missouri-Rolla, Available: http://academic.csuohio.edu/fuy/Pub/tcdp01.pdf
[7] A. O. Ogunde, O. Folorunso, A. S. Sodiya & G. O. Ogunleye, (2011) “A Review of Some Issues and
Challenges in Current Agent-Based Distributed Association Rule Mining”, Asian Journal of
Information Technology, vol. 10, no. 2, pp. 84–95.
[8] G. S. Bhamra, R. B. Patel & A. K. Verma, (2014) “Intelligent Software Agent Technology: An
Overview”, International Journal of Computer Applications (IJCA), vol. 89, no. 2, pp. 19–31.
[9] A. Gibbs & G. McIntyre, (1970) “The diagram, a method for comparing sequences”, European
Journal of Biochemistry, vol. 16, pp.1-11.
[10] J. C. Setubal & J. Meidanis, (1997) “Introduction to Computational Molecular Biology”, PWS
Publishing Company.
[11] T. Attwood & D. Parry-Smith, (1999) “Introduction to Bioinformatics”, First ed. Prentice Hall.
[12] Z. Ezziane, (2006) “Applications of artificial intelligence in bioinformatics: A review”, Expert Systems with
Applications, vol. 30, pp. 2-10.
15. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
39
[13] R. Agrawal, T. Imielinski & A. Swami, (1993) “Mining association rules between sets of items in
large databases”, in Proceedings of the ACM-SIGMOD International Conference of Management of
Data, pp. 207–216.
[14] R. Agrawal & J. C. Shafer, (1996) “Parallel mining of association rules”, IEEE Transaction on
Knowledge and Data Engineering, vol. 8, no. 6, pp. 962–969.
[15] M. J. Zaki, (1999) “Parallel and distributed association mining: a survey”, IEEE Concurrency, vol. 7,
no. 4, pp. 14–25.
[16] Loredana Lo Conte, Steven E. Brenner, et al., (2002) “SCOP database in 2002: refinements
accommodate structural genomics”, Nucleic Acids Research, vol. 30, no. 1, pp. 264-267.
[17] Alexey G. Murzin, Steven E. Brenner, et al., (1995) “SCOP: a structural classification of proteins
database for the investigation of sequences and structures”, Journal of Molecular Biology, vol. 247,
no. 4, pp.536-540.
[18] AstralSCOP, (2009) Astral SCOP version 1.75 genetic domain sequence subsets with less than 40%
identity to each other, http://scop.berkeley.edu/astral/ver=1.75.
[19] R. Agrawal and R. Srikant, (1994) “Fast Algorithms for Mining Association Rules in Large
Databases”, In Proceedings of the 20th
International Conference on Very Large Data Bases
(VLDB'94), pp. 487-499.
[20] Nitin Gupta, Nitin Mangal, Kamal Tiwari, & Pabitra Mitra, (2006) “Mining Quantitative Association
Rules in Protein Sequences”, In Graham J. Williams & Simeon J. Simeon, editors, Data Mining,
Lecture Notes in Computer Science, vol. 3755, pp. 273-281.
[21] Qiang Yang & Xindong Wu, (2006) “10 Challenging Problems in Data Mining Research”,
International Journal of Information Technology & Decision Making, vol. 5, no. 4, pp. 597-604.
[22] Gurpreet S. Bhamra, Anil K. Verma & Ram B. Patel, (2014) “Agent Technology in Bioinformatics: A
Review”, International Journal of Next-Generation Computing, vol. 5, no. 2, pp.176-189.
[23] Gurpreet S. Bhamra, Anil K. Verma & Ram B. Patel, (2015) “Agent Based Frameworks for
Distributed Association Rule Mining: An Analysis”, International Journal in Foundations of
Computer Science & Technology (IJFCST), vol. 5, no. 1, pp. 11-22.
AUTHORS
Gurpreet Singh Bhamra is currently working as Assistant Professor at Department
of Computer Science and Engineering, M. M. University, Mullana, Haryana. He
received his B.Sc. (Computer Sc.) and MCA from Kurukshetra University,
Kurukshetra in 1995 and 1998, respectively. He is pursuing Ph.D. from Department
of Computer Science and Engineering, Thapar University, Patiala, Punjab. He is in
teaching since 1998. He has published 12 research papers in International/National
Journals and International Conferences. He is a Life Member of Computer Society of
India. His research interests are in Distributed Computing, Distributed Data Mining,
Mobile Agents and Bio-informatics.
Dr. Anil Kumar Verma is currently working as Associate Professor at Department
of Computer Science & Engineering, Thapar University, Patiala. He received his
B.S., M.S. and Ph.D. in 1991, 2001 and 2008 respectively, majoring in Computer
science and engineering. He has worked as Lecturer at M.M.M. Engineering College,
Gorakhpur from 1991 to 1996. He joined Thapar Institute of Engineering &
Technology in 1996 as a Systems Analyst in the Computer Centre and is presently
associated with the same Institute. He has been a visiting faculty to many
institutions. He has published over 100 papers in referred journals and conferences
(India and Abroad). He is a MISCI (Turkey), LMCSI (Mumbai), GMAIMA (New
Delhi). He is a certified software quality auditor by MoCIT, Govt. of India. His
research interests include wireless networks, routing algorithms and securing ad hoc networks and data
mining.
16. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
40
Dr. Ram Bahadur Patel is currently working as Professor at Department of
Computer Science & Engineering, Chandigarh College of Engineering &
Technology, Chandigarh. He received PhD from IIT Roorkee in Computer Science &
Engineering, PDF from Highest Institute of Education, Science & Technology
(HIEST), Athens, Greece, MS (Software Systems) from BITS Pilani and B. E. in
Computer Engineering from M. M. M. Engineering College, Gorakhpur, UP. Dr.
Patel is in teaching and research since 1991. He has supervised 36 M. Tech, 7 M.
Phil. and 8 PhD Thesis. He is currently supervising 6 PhD students. He has published
130 research papers in International/National Journals and Conferences. He is
member of ISTE (New Delhi), IEEE (USA). His current research interests are in
Mobile & Distributed Computing, Mobile Agent Security and Fault Tolerance and Sensor Network.
APPENDIX A – DATASETS USED FOR AEQARM-AAPDB SYSTEM
A.1 1
PDB at site 1
S
This is a real dataset of protein sequences, the Astral Structural Classification of Proteins (SCOP)
[16], [17] version 1.75 genetic domain sequence subsets, based on PDB SEQRES records with
less than 40% identity to each other. SCOP database is comprehensive ordering of all proteins of
known structure, according to their evolutionary and structural relationships. Protein domains in
the SCOP are grouped into species and hierarchically classified into families, superfamilies, folds
and classes. In this dataset each protein record starts with greater than (`>') character followed by
protein description headers according to SCOP. An additional hash (`#') character is inserted as a
separator character between protein description headers and protein sequence of amino acids. A
total of 3523 protein records are stored in this protein data bank ( 1
PDB ) at site 1
S .
A.2 1
FPDB at site 1
S
1
PDB shown in Appendix A.1 is further filtered to generate 1
FPDB for the protein sequence
length range 50
≥ and 400
< . A total of 3341 such filtered protein records are obtained.
17. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
41
A.3 1
AAF generated from 1
FPDB at site 1
S
Each column heading represents single letter code of an amino acid. Each row represents the
frequencies of amino acids in a particular protein sequence record.
A.4 Amino acids frequency ranges (15 ranges for each amino acid)
Item No. column represents Serial No. from 1 to 300. Data in the AA Frequency Range column
represents single letter code of amino acid along with a frequency range. Frequency of each
amino acid is divided into 15 partitions (ranges) resulting into 300 items for 20 amino acids. For
example if frequency of amino acid Alanine (A) in a protein record is 22 it lies at Item No. 8.
18. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
42
A.5 1
BDB at site 1
S
This Boolean data bank ( 1
BDB ) is created using 1
AAF as shown in Appendix A.3 and amino
acids frequency range table as shown in Appendix A.4. Each column heading represents an Item
No. as shown in Appendix A.4. Each amino acid has 15 frequency partitions and for each amino
acid, a boolean value ‘1’ is put under that Item No. in which frequency of amino acid lies in a
particular protein record otherwise a value ‘0’ is considered. For example it clear from 1
AAF that
the frequency of amino acid Alanine (A) in the 1st
protein record is 22 and this frequency lies at
Item No. 8 in frequency range table as shown in Appendix A.4, so a boolean value ‘1’ is put
under Item No. 8 in the 1st
protein record in 1
BDB .
19. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
43
A.6 1
IDB at site 1
S
This Itemset data bank ( 1
IDB ) is created using 1
BDB as shown in Appendix A.4. Each record in
this dataset consist of 20 Item Numbers, one for each of amino acids for which a boolean value
‘1’ is stored in 1
BDB .
APPENDIX B – RESULTANT KNOWLEDGE OF AEQARM-AAPDB SYSTEM
B.1 (1)
FI
k
L and (1)
FISC
k
L at site 1
S
List of frequent k-itemset, i.e., (1)
FI
k
L is represented by column L and column SC shows the
support count of the corresponding frequent k-itemset, i.e., (1)
FISC
k
L at site 1
S . These frequent
itemsets and their support counts are obtained by processing the Itemset Data Bank ( 1
IDB ) as
shown in Appendix A.6.
20. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
44
B.2 1
LSAR
L for Item numbers at site 1
S
Column L represents frequent k-itemset and column AR(support,confidence) shows the list of
locally strong association rules, i.e., 1
LSAR
L at site 1
S . Each strong rule has its associated support
and confidence factor. The minimum threshold support is taken as 20% and minimum threshold
confidence as 50% for generating the strong rules by making use of the data as shown in
Appendix B.1.
21. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
45
B.3 1
LSAR
L for corresponding Amino acid frequency ranges at site 1
S
Replace Item No. in Appendix B.2 data with its corresponding Amino acid frequency range as
shown in table Appendix A.4.