The document proposes a two-stage model for Parkinson's disease diagnosis that uses feature selection and data resampling techniques. In the first stage, a modified maximum relevance minimum redundancy (mRMR) feature selection approach is used based on Tabu Search to reduce the number of features. In the second stage, simple random sampling is applied to the dataset to increase the samples of the minority class. When evaluated on an LSVT dataset using 10-fold cross validation, the developed approach obtained a classification accuracy of 95% using 24 features.
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISijcsit
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical
research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more
number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well
as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
Machine Learning Based Approaches for Prediction of Parkinson's Disease mlaij
The prediction of Parkinson’s disease is most important and challenging problem for biomedical engineering researchers and doctors. The symptoms of disease are investigated in middle and late middle age. In this paper, minimum redundancy maximum relevance feature selection algorithms is used to select the most important feature among all the features to predict the Parkinson diseases. Here, it is observed that the random forest with 20 number of features selected by minimum redundancy maximum relevance feature selection algorithms provide the overall accuracy 90.3%, precision 90.2%, Mathews correlation coefficient values of 0.73 and ROC values 0.96 which is better in comparison to all other machine learning based approaches such as bagging, boosting, random forest, rotation forest, random subspace, support vector machine, multilayer perceptron, and decision tree based methods.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
The state of the art in behavioral machine learning for healthcareAfrica Perianez
The use of smart devices and wearables is becoming increasingly popular. This allows patients to be continuously monitored and provides a huge amount of health-related data that, if properly analyzed, can be used to improve their health by predicting potential future conditions. Advanced machine learning techniques do permit such analysis, and thus serve to forecast the evolution and health challenges of individual patients. This includes, for instance, issues as critical as early detection of heart disease.
But, moreover, the whole healthcare sector is currently undergoing a profound transformation. The rich profusion of digital data is fostering a move from more traditional approaches towards a data-driven prevention model.
In this talk I survey state of the art methods that allowed an AI-based early diagnosis and risk assessment for individual patients, using information that may include health records, genomic and wearable device data, medical imagery and online physician reviews. I will focus on methods that can be employed to forecast future events affecting a specific patient and serve to evaluate wearable device data and assist healthcare industry in undertaking a patient-focused data-driven preventive approach. Additionally, I introduce how machine-learning-based gamification techniques can be employed to motivate individual users to improve their health condition and achieve personalized challenges.
بعض (وليس الكل) ملخصات الأبحاث الجيدة المنشورة فى بعض المجلات الجيدة وفيها تنوع من الافكار الابحاث الابتكارية التى يخدم فيها علوم الحاسبات فيها - انها تطبيقات حياتية
An Enhanced Feature Selection Method to Predict the Severity in Brain Tumorijtsrd
The level of severity of brain tumor is captured through MRI and then assessed by the physician for their medical interpretation. The facts behind the MRI images are then analyzed by the physician for further medication and follow up activities. An MRI image composed of large volume of features. It has irrelevant, missing and information which is not certain. In medical data analysis, an MRI image doesn't express facts very clearly to the physician for correct interpretation all the time. It also includes huge amount of redundant information within it. A mathematical model known as rough set theory has been applied to resolve this problem by eliminating the redundancy in medical image data. This paper uses a rough set method to find the severity level of the brain tumor of the given MRI image. Rough set feature selection algorithms are applied over the medical image data to select the prominent features. The classification accuracy of the brain tumor can be improved to a better level by using this rough set approach. The prominent features selected through this approach deliver a set of decision rules for the classification task. A search method based on the particle swarm optimization is proposed in this paper for minimizing the attribute set. This approach is compared with previously existing rough set reduction algorithm for finding the accuracy. The reducts originated from the proposed algorithm is more efficient and can generate decision rules that will better classify the tumor types. The rule based method provided by the rough set method delivers classification accuracy in higher level than other smart methods such as fuzzy rule extraction, neural networks, decision trees and Fuzzy Networks like Fuzzy Min Max Neural Networks. Parthiban J | Dr. B. Sathees Kumar "An Enhanced Feature Selection Method to Predict the Severity in Brain Tumor" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26802.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26802/an-enhanced-feature-selection-method-to-predict-the-severity-in-brain-tumor/parthiban-j
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISijcsit
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical
research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more
number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well
as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
Machine Learning Based Approaches for Prediction of Parkinson's Disease mlaij
The prediction of Parkinson’s disease is most important and challenging problem for biomedical engineering researchers and doctors. The symptoms of disease are investigated in middle and late middle age. In this paper, minimum redundancy maximum relevance feature selection algorithms is used to select the most important feature among all the features to predict the Parkinson diseases. Here, it is observed that the random forest with 20 number of features selected by minimum redundancy maximum relevance feature selection algorithms provide the overall accuracy 90.3%, precision 90.2%, Mathews correlation coefficient values of 0.73 and ROC values 0.96 which is better in comparison to all other machine learning based approaches such as bagging, boosting, random forest, rotation forest, random subspace, support vector machine, multilayer perceptron, and decision tree based methods.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
The state of the art in behavioral machine learning for healthcareAfrica Perianez
The use of smart devices and wearables is becoming increasingly popular. This allows patients to be continuously monitored and provides a huge amount of health-related data that, if properly analyzed, can be used to improve their health by predicting potential future conditions. Advanced machine learning techniques do permit such analysis, and thus serve to forecast the evolution and health challenges of individual patients. This includes, for instance, issues as critical as early detection of heart disease.
But, moreover, the whole healthcare sector is currently undergoing a profound transformation. The rich profusion of digital data is fostering a move from more traditional approaches towards a data-driven prevention model.
In this talk I survey state of the art methods that allowed an AI-based early diagnosis and risk assessment for individual patients, using information that may include health records, genomic and wearable device data, medical imagery and online physician reviews. I will focus on methods that can be employed to forecast future events affecting a specific patient and serve to evaluate wearable device data and assist healthcare industry in undertaking a patient-focused data-driven preventive approach. Additionally, I introduce how machine-learning-based gamification techniques can be employed to motivate individual users to improve their health condition and achieve personalized challenges.
بعض (وليس الكل) ملخصات الأبحاث الجيدة المنشورة فى بعض المجلات الجيدة وفيها تنوع من الافكار الابحاث الابتكارية التى يخدم فيها علوم الحاسبات فيها - انها تطبيقات حياتية
An Enhanced Feature Selection Method to Predict the Severity in Brain Tumorijtsrd
The level of severity of brain tumor is captured through MRI and then assessed by the physician for their medical interpretation. The facts behind the MRI images are then analyzed by the physician for further medication and follow up activities. An MRI image composed of large volume of features. It has irrelevant, missing and information which is not certain. In medical data analysis, an MRI image doesn't express facts very clearly to the physician for correct interpretation all the time. It also includes huge amount of redundant information within it. A mathematical model known as rough set theory has been applied to resolve this problem by eliminating the redundancy in medical image data. This paper uses a rough set method to find the severity level of the brain tumor of the given MRI image. Rough set feature selection algorithms are applied over the medical image data to select the prominent features. The classification accuracy of the brain tumor can be improved to a better level by using this rough set approach. The prominent features selected through this approach deliver a set of decision rules for the classification task. A search method based on the particle swarm optimization is proposed in this paper for minimizing the attribute set. This approach is compared with previously existing rough set reduction algorithm for finding the accuracy. The reducts originated from the proposed algorithm is more efficient and can generate decision rules that will better classify the tumor types. The rule based method provided by the rough set method delivers classification accuracy in higher level than other smart methods such as fuzzy rule extraction, neural networks, decision trees and Fuzzy Networks like Fuzzy Min Max Neural Networks. Parthiban J | Dr. B. Sathees Kumar "An Enhanced Feature Selection Method to Predict the Severity in Brain Tumor" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26802.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26802/an-enhanced-feature-selection-method-to-predict-the-severity-in-brain-tumor/parthiban-j
An optimized approach for extensive segmentation and classification of brain ...IJECEIAES
With the significant contribution in medical image processing for an effective diagnosis of critical health condition in human, there has been evolution of various methods and techniques in abnormality detection and classification process. An insight to the existing approaches highlights that potential amount of work is being carried out in detection and segmentation process but less effective modelling towards classification problems. This manuscript discusses about a simple and robust modelling of a technique that offers comprehensive segmentation process as well as classification process using Artificial Neural Network. Different from any existing approach, the study offers more granularities towards foreground/ background indexing with its comprehensive segmentation process while introducing a unique morphological operation along with graph-believe network for ensuring approximately 99% of accuracy of proposed system in contrast to existing learning scheme.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
Cancer prognosis prediction using balanced stratified samplingijscai
High accuracy in cancer prediction is important to improve the quality of the treatment and to improve the
rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the
analytical challenge exists in double. The use of effective sampling technique in classification algorithms
always yields good prediction accuracy. The SEER public use cancer database provides various prominent
class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling
techniques in classifying the prognosis variable and propose an ideal sampling method based on the
outcome of the experimentation. In the first phase of this work the traditional random sampling and
stratified sampling techniques have been used. At the next level the balanced stratified sampling with
variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been
focused on performing the pre-processing of the SEER data set. The classification model for
experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with
three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three
prognosis factors survival, stage and metastasis have been used as class labels for experimental
comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model
as the sample size increases, but the traditional approach fluctuates before the optimum results.
Disease prediction and doctor recommendation systemsabafarheen
This paper will tell you how the system will work in terms of disease prediction also will suggest you nearest hospital with experienced doctors, cheap fees
Patterns discovered from based on collected molecular profiles of patient tumour samples, and also clinical metadata, could be used to provide personalized cancer treatment to patients with
similar molecular subtypes. Computational algorithms for cancer diagnosis, prognosis, and therapeutics that can recognize specific functions and aid in classifiers based on a plethora of
publicly accessible cancer research outcomes are needed. Machine learning, a branch of artificial intelligence, has a great deal of potential for problem solving in cryptic cancer
datasets, as per a literature study. We focus on the new state of machine learning applications in cancer research in this study, illustrating trends and analysing major accomplishments,
roadblocks, and challenges along the way to clinic implementation. In the context of noninvasive treating cancer using diet-based and natural biomarkers, we propose a novel machine learning algorithm.
Hybrid System of Tiered Multivariate Analysis and Artificial Neural Network f...IJECEIAES
Improved system performance diagnosis of coronary heart disease becomes an important topic in research for several decades. One improvement would be done by features selection, so only the attributes that influence is used in the diagnosis system using data mining algorithms. Unfortunately, the most feature selection is done with the assumption has provided all the necessary attributes, regardless of the stage of obtaining the attribute, and cost required. This research proposes a hybrid model system for diagnosis of coronary heart disease. System diagnosis preceded the feature selection process, using tiered multivariate analysis. The analytical method used is logistic regression. The next stage, the classification by using multi-layer perceptron neural network. Based on test results, system performance proposed value for accuracy 86.3%, sensitivity 84.80%, specificity 88.20%, positive prediction value (PPV) 90.03%, negative prediction value (NPV) 81.80%, accuracy 86,30% and area under the curve (AUC) of 92.1%. The performance of a diagnosis using a combination attributes of risk factors, symptoms and exercise ECG. The conclusion that can be drawn is that the proposed diagnosis system capable of delivering performance in the very good category, with a number of attributes that are not a lot of checks and a relatively low cost.
Early Detection of Lung Cancer Using Neural Network TechniquesIJERA Editor
Effective identification of lung cancer at an initial stage is an important and crucial aspect of image processing. Several data mining methods have been used to detect lung cancer at early stage. In this paper, an approach has been presented which will diagnose lung cancer at an initial stage using CT scan images which are in Dicom (DCM) format. One of the key challenges is to remove white Gaussian noise from the CT scan image, which is done using non local mean filter and to segment the lung Otsu’s thresholding is used. The textural and structural features are extracted from the processed image to form feature vector. In this paper, three classifiers namely SVM, ANN, and k-NN are applied for the detection of lung cancer to find the severity of disease (stage I or stage II) and comparison is made with ANN, and k-NN classifier with respect to different quality attributes such as accuracy, sensitivity(recall), precision and specificity. It has been found from results that SVM achieves higher accuracy of 95.12% while ANN achieves 92.68% accuracy on the given data set and k-NN shows least accuracy of 85.37%. SVM algorithm which achieves 95.12% accuracy helps patients to take remedial action on time and reduces mortality rate from this deadly disease.
Parkinson’s diagnosis hybrid system based on deep learning classification wit...IJECEIAES
Brain degeneration involves several neurological troubles such as Parkinson’s disease (PD). Since this neurodegenerative disorder has no known cure, early detection has a paramount role in improving the patient’s life. Research has shown that voice disorder is one of the first symptoms detected. The application of deep learning techniques to data extracted from voice allows the production of a diagnostic support system for the Parkinson’s disease detection. In this work, we adopted the synthetic minority oversampling technique (SMOTE) technique to solve the imbalanced class problems. We performed feature selection, relying on the Chi-square feature technique to choose the most significant attributes. We opted for three deep learning classifiers, which are long-short term memory (LSTM), bidirectional LSTM (Bi-LSTM), and deep-LSTM (D-LSTM). After tuning the parameters by selecting different options, the experiment results show that the D-LSTM technique outperformed the LSTM and Bi-LSTM ones. It yielded the best score for both the imbalanced original dataset and for the balanced dataset with accuracy scores of 94.87% and 97.44%, respectively.
Breast cancer is the leading cause of death for women worldwide. Cancer can be discovered early, lowering the rate of death. Machine learning techniques are a hot field of research, and they have been shown to be helpful in cancer prediction and early detection. The primary purpose of this research is to identify which machine learning algorithms are the most successful in predicting and diagnosing breast cancer, according to five criteria: specificity, sensitivity, precision, accuracy, and F1 score. The project is finished in the Anaconda environment, which uses Python's NumPy and SciPy numerical and scientific libraries as well as matplotlib and Pandas. In this study, the Wisconsin diagnostic breast cancer dataset was used to evaluate eleven machine learning classifiers: decision tree, quadratic discriminant analysis, AdaBoost, Bagging meta estimator, Extra randomized trees, Gaussian process classifier, Ridge, Gaussian nave Bayes, k-Nearest neighbors, multilayer perceptron, and support vector classifier. During performance analysis, extremely randomized trees outperformed all other classifiers with an F1-score of 96.77% after data collection and data analysis.
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
Feature extraction is a method of capturing visual content of an image. The feature extraction is the process to represent raw image in its reduced form to facilitate decision making such as pattern classification. We have tried to address the problem of classification MRI brain images by creating a robust and more accurate classifier which can act as an expert assistant to medical practitioners. The objective of this paper is to present a novel method of feature selection and extraction. This approach combines the Intensity, Texture, shape based features and classifies the tumor as white matter, Gray matter, CSF, abnormal and normal area. The experiment is performed on 140 tumor contained brain MR images from the Internet Brain Segmentation Repository. The proposed technique has been carried out over a larger database as compare to any previous work and is more robust and effective. PCA and Linear Discriminant Analysis (LDA) were applied on the training sets. The Support Vector Machine (SVM) classifier served as a comparison of nonlinear techniques Vs linear ones. PCA and LDA methods are used to reduce the number of features used. The feature selection using the proposed technique is more beneficial as it analyses the data according to grouping class variable and gives reduced feature set with high classification accuracy.
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
Feature extraction is a method of capturing visual content of an image. The feature extraction is the process to represent raw image in its reduced form to facilitate decision making such as pattern classification. We have tried to address the problem of classification MRI brain images by creating a robust and more accurate classifier which can act as an expert assistant to medical practitioners. The objective of this paper is to present a novel method of feature selection and extraction. This approach combines the Intensity, Texture, shape based features and classifies the tumor as white matter, Gray matter, CSF, abnormal and normal area. The experiment is performed on 140 tumor contained brain MR images from the Internet Brain Segmentation Repository. The proposed technique has been carried out over a larger database as compare to any previous work and is more robust and effective. PCA and Linear Discriminant Analysis (LDA) were applied on the training sets. The Support Vector Machine (SVM) classifier served as a comparison of nonlinear techniques Vs linear ones. PCA and LDA methods are used to reduce the number of features used. The feature selection using the proposed technique is more beneficial as it analyses the data according to grouping class variable and gives reduced feature set with high classification accuracy.
An optimized approach for extensive segmentation and classification of brain ...IJECEIAES
With the significant contribution in medical image processing for an effective diagnosis of critical health condition in human, there has been evolution of various methods and techniques in abnormality detection and classification process. An insight to the existing approaches highlights that potential amount of work is being carried out in detection and segmentation process but less effective modelling towards classification problems. This manuscript discusses about a simple and robust modelling of a technique that offers comprehensive segmentation process as well as classification process using Artificial Neural Network. Different from any existing approach, the study offers more granularities towards foreground/ background indexing with its comprehensive segmentation process while introducing a unique morphological operation along with graph-believe network for ensuring approximately 99% of accuracy of proposed system in contrast to existing learning scheme.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
Cancer prognosis prediction using balanced stratified samplingijscai
High accuracy in cancer prediction is important to improve the quality of the treatment and to improve the
rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the
analytical challenge exists in double. The use of effective sampling technique in classification algorithms
always yields good prediction accuracy. The SEER public use cancer database provides various prominent
class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling
techniques in classifying the prognosis variable and propose an ideal sampling method based on the
outcome of the experimentation. In the first phase of this work the traditional random sampling and
stratified sampling techniques have been used. At the next level the balanced stratified sampling with
variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been
focused on performing the pre-processing of the SEER data set. The classification model for
experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with
three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three
prognosis factors survival, stage and metastasis have been used as class labels for experimental
comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model
as the sample size increases, but the traditional approach fluctuates before the optimum results.
Disease prediction and doctor recommendation systemsabafarheen
This paper will tell you how the system will work in terms of disease prediction also will suggest you nearest hospital with experienced doctors, cheap fees
Patterns discovered from based on collected molecular profiles of patient tumour samples, and also clinical metadata, could be used to provide personalized cancer treatment to patients with
similar molecular subtypes. Computational algorithms for cancer diagnosis, prognosis, and therapeutics that can recognize specific functions and aid in classifiers based on a plethora of
publicly accessible cancer research outcomes are needed. Machine learning, a branch of artificial intelligence, has a great deal of potential for problem solving in cryptic cancer
datasets, as per a literature study. We focus on the new state of machine learning applications in cancer research in this study, illustrating trends and analysing major accomplishments,
roadblocks, and challenges along the way to clinic implementation. In the context of noninvasive treating cancer using diet-based and natural biomarkers, we propose a novel machine learning algorithm.
Hybrid System of Tiered Multivariate Analysis and Artificial Neural Network f...IJECEIAES
Improved system performance diagnosis of coronary heart disease becomes an important topic in research for several decades. One improvement would be done by features selection, so only the attributes that influence is used in the diagnosis system using data mining algorithms. Unfortunately, the most feature selection is done with the assumption has provided all the necessary attributes, regardless of the stage of obtaining the attribute, and cost required. This research proposes a hybrid model system for diagnosis of coronary heart disease. System diagnosis preceded the feature selection process, using tiered multivariate analysis. The analytical method used is logistic regression. The next stage, the classification by using multi-layer perceptron neural network. Based on test results, system performance proposed value for accuracy 86.3%, sensitivity 84.80%, specificity 88.20%, positive prediction value (PPV) 90.03%, negative prediction value (NPV) 81.80%, accuracy 86,30% and area under the curve (AUC) of 92.1%. The performance of a diagnosis using a combination attributes of risk factors, symptoms and exercise ECG. The conclusion that can be drawn is that the proposed diagnosis system capable of delivering performance in the very good category, with a number of attributes that are not a lot of checks and a relatively low cost.
Early Detection of Lung Cancer Using Neural Network TechniquesIJERA Editor
Effective identification of lung cancer at an initial stage is an important and crucial aspect of image processing. Several data mining methods have been used to detect lung cancer at early stage. In this paper, an approach has been presented which will diagnose lung cancer at an initial stage using CT scan images which are in Dicom (DCM) format. One of the key challenges is to remove white Gaussian noise from the CT scan image, which is done using non local mean filter and to segment the lung Otsu’s thresholding is used. The textural and structural features are extracted from the processed image to form feature vector. In this paper, three classifiers namely SVM, ANN, and k-NN are applied for the detection of lung cancer to find the severity of disease (stage I or stage II) and comparison is made with ANN, and k-NN classifier with respect to different quality attributes such as accuracy, sensitivity(recall), precision and specificity. It has been found from results that SVM achieves higher accuracy of 95.12% while ANN achieves 92.68% accuracy on the given data set and k-NN shows least accuracy of 85.37%. SVM algorithm which achieves 95.12% accuracy helps patients to take remedial action on time and reduces mortality rate from this deadly disease.
Parkinson’s diagnosis hybrid system based on deep learning classification wit...IJECEIAES
Brain degeneration involves several neurological troubles such as Parkinson’s disease (PD). Since this neurodegenerative disorder has no known cure, early detection has a paramount role in improving the patient’s life. Research has shown that voice disorder is one of the first symptoms detected. The application of deep learning techniques to data extracted from voice allows the production of a diagnostic support system for the Parkinson’s disease detection. In this work, we adopted the synthetic minority oversampling technique (SMOTE) technique to solve the imbalanced class problems. We performed feature selection, relying on the Chi-square feature technique to choose the most significant attributes. We opted for three deep learning classifiers, which are long-short term memory (LSTM), bidirectional LSTM (Bi-LSTM), and deep-LSTM (D-LSTM). After tuning the parameters by selecting different options, the experiment results show that the D-LSTM technique outperformed the LSTM and Bi-LSTM ones. It yielded the best score for both the imbalanced original dataset and for the balanced dataset with accuracy scores of 94.87% and 97.44%, respectively.
Breast cancer is the leading cause of death for women worldwide. Cancer can be discovered early, lowering the rate of death. Machine learning techniques are a hot field of research, and they have been shown to be helpful in cancer prediction and early detection. The primary purpose of this research is to identify which machine learning algorithms are the most successful in predicting and diagnosing breast cancer, according to five criteria: specificity, sensitivity, precision, accuracy, and F1 score. The project is finished in the Anaconda environment, which uses Python's NumPy and SciPy numerical and scientific libraries as well as matplotlib and Pandas. In this study, the Wisconsin diagnostic breast cancer dataset was used to evaluate eleven machine learning classifiers: decision tree, quadratic discriminant analysis, AdaBoost, Bagging meta estimator, Extra randomized trees, Gaussian process classifier, Ridge, Gaussian nave Bayes, k-Nearest neighbors, multilayer perceptron, and support vector classifier. During performance analysis, extremely randomized trees outperformed all other classifiers with an F1-score of 96.77% after data collection and data analysis.
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
Feature extraction is a method of capturing visual content of an image. The feature extraction is the process to represent raw image in its reduced form to facilitate decision making such as pattern classification. We have tried to address the problem of classification MRI brain images by creating a robust and more accurate classifier which can act as an expert assistant to medical practitioners. The objective of this paper is to present a novel method of feature selection and extraction. This approach combines the Intensity, Texture, shape based features and classifies the tumor as white matter, Gray matter, CSF, abnormal and normal area. The experiment is performed on 140 tumor contained brain MR images from the Internet Brain Segmentation Repository. The proposed technique has been carried out over a larger database as compare to any previous work and is more robust and effective. PCA and Linear Discriminant Analysis (LDA) were applied on the training sets. The Support Vector Machine (SVM) classifier served as a comparison of nonlinear techniques Vs linear ones. PCA and LDA methods are used to reduce the number of features used. The feature selection using the proposed technique is more beneficial as it analyses the data according to grouping class variable and gives reduced feature set with high classification accuracy.
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
Feature extraction is a method of capturing visual content of an image. The feature extraction is the process to represent raw image in its reduced form to facilitate decision making such as pattern classification. We have tried to address the problem of classification MRI brain images by creating a robust and more accurate classifier which can act as an expert assistant to medical practitioners. The objective of this paper is to present a novel method of feature selection and extraction. This approach combines the Intensity, Texture, shape based features and classifies the tumor as white matter, Gray matter, CSF, abnormal and normal area. The experiment is performed on 140 tumor contained brain MR images from the Internet Brain Segmentation Repository. The proposed technique has been carried out over a larger database as compare to any previous work and is more robust and effective. PCA and Linear Discriminant Analysis (LDA) were applied on the training sets. The Support Vector Machine (SVM) classifier served as a comparison of nonlinear techniques Vs linear ones. PCA and LDA methods are used to reduce the number of features used. The feature selection using the proposed technique is more beneficial as it analyses the data according to grouping class variable and gives reduced feature set with high classification accuracy.
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to
analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection based ensemble learning models is to classify the high dimensional data with high computational efficiency and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%.
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many
biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon
cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies
in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in
their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms
and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the
matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix
Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification
accuracies are then compared for these algorithms.This technique gives an accuracy of 98%
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...CSCJournals
Logistic Regression (LR) is a well known classification method in the field of statistical learning. It allows probabilistic classification and shows promising results on several benchmark problems. Logistic regression enables us to investigate the relationship between a categorical outcome and a set of explanatory variables. Artificial Neural Networks (ANNs) are popularly used as universal non-linear inference models and have gained extensive popularity in recent years. Research activities are considerable and literature is growing. The goal of this research work is to compare the performance of Logistic Regression and Neural Network models on publicly available medical datasets. The evaluation process of the model is as follows. The logistic regression and neural network methods with sensitivity analysis have been evaluated for the effectiveness of the classification. The Classification Accuracy is used to measure the performance of both the models. From the experimental results it is confirmed that the neural network model with sensitivity analysis model gives more efficient result.
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...IJECEIAES
In many diseases classification an accurate gene analysis is needed, for which selection of most informative genes is very important and it require a technique of decision in complex context of ambiguity. The traditional methods include for selecting most significant gene includes some of the statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise Ratio (SNR). This paper evaluates gene selection and classification on the basis of accurate gene selection using structured complex decision technique (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is evaluated for leave one out cross validation metric(LOOCV) along with sensitivity, specificity, precision and F1-score with four different classifiers namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) Feed Forward(FF) and 4) Support vector machine(SVM) for three different datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT &FC-NNC exhibits superior result for being considered more accurate decision mechanism.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The biomedical profession has gained importance due to the rapid and accurate diagnosis of clinical patients using computer-aided diagnosis (CAD) tools.
The diagnosis and treatment of Alzheimer’s disease (AD) using complementary multimodalities can improve the quality of life and mental state of patients.
In this study, we integrated a lightweight custom convolutional neural network
(CNN) model and nature-inspired optimization techniques to enhance the performance, robustness, and stability of progress detection in AD. A multi-modal
fusion database approach was implemented, including positron emission tomography (PET) and magnetic resonance imaging (MRI) datasets, to create a fused
database. We compared the performance of custom and pre-trained deep learning models with and without optimization and found that employing natureinspired algorithms like the particle swarm optimization algorithm (PSO) algorithm significantly improved system performance. The proposed methodology,
which includes a fused multimodality database and optimization strategy, improved performance metrics such as training, validation, test accuracy, precision, and recall. Furthermore, PSO was found to improve the performance of
pre-trained models by 3-5% and custom models by up to 22%. Combining different medical imaging modalities improved the overall model performance by
2-5%. In conclusion, a customized lightweight CNN model and nature-inspired
optimization techniques can significantly enhance progress detection, leading to
better biomedical research and patient care.
Ataxic person prediction using feature optimized based on machine learning modelIJECEIAES
Ataxic gait monitoring and assessment of neurological disorders belong to important areas that are supported by digital signal processing methods and artificial intelligence (AI) techniques such as machine learning (ML) and deep learning (DL) techniques. This paper uses spatio-temporal data from Kinect sensor to optimize machine learning model to distinguish between ataxic and normal gait. Existing ML-based methodologies fails to establish feature correlation between different gait parameters; thus, exhibit very poor performance. Further, when data is imbalanced in nature the existing ML-based methodologies induces higher false positive. In addressing the research issues this paper introduces an extreme gradient boost (XGBoost)based classifier and enhanced feature optimization (EFO) by modifying the standard cross validation (SCV) mechanism. Experiment outcome shows the proposed ataxic person identification model achieves very good result in comparison with existing ML-based and DL-based ataxic person identification methodologies.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AI 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)gerogepatton
3rd International Conference on Artificial Intelligence Advances (AIAD 2024) will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area advanced Artificial Intelligence. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the research area. Core areas of AI and advanced multi-disciplinary and its applications will be covered during the conferences.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Information Extraction from Product Labels: A Machine Vision Approachgerogepatton
This research tackles the challenge of manual data extraction from product labels by employing a blend of
computer vision and Natural Language Processing (NLP). We introduce an enhanced model that combines
Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) in a Convolutional
Recurrent Neural Network (CRNN) for reliable text recognition. Our model is further refined by
incorporating the Tesseract OCR engine, enhancing its applicability in Optical Character Recognition
(OCR) tasks. The methodology is augmented by NLP techniques and extended through the Open Food
Facts API (Application Programming Interface) for database population and text-only label prediction.
The CRNN model is trained on encoded labels and evaluated for accuracy on a dedicated test set.
Importantly, our approach enables visually impaired individuals to access essential information on
product labels, such as directions and ingredients. Overall, the study highlights the efficacy of deep
learning and OCR in automating label extraction and recognition.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AI 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...gerogepatton
Aiming at the problems of poor local search ability and precocious convergence of fuzzy C-cluster
recursive genetic algorithm (FOLD++), a new fuzzy C-cluster recursive genetic algorithm based on
Bayesian function adaptation search (TS) was proposed by incorporating the idea of Bayesian function
adaptation search into fuzzy C-cluster recursive genetic algorithm. The new algorithm combines the
advantages of FOLD++ and TS. In the early stage of optimization, fuzzy C-cluster recursive genetic
algorithm is used to get a good initial value, and the individual extreme value pbest is put into Bayesian
function adaptation table. In the late stage of optimization, when the searching ability of fuzzy C-cluster
recursive genetic is weakened, the short term memory function of Bayesian function adaptation table in
Bayesian function adaptation search algorithm is utilized. Make it jump out of the local optimal solution,
and allow bad solutions to be accepted during the search. The improved algorithm is applied to function
optimization, and the simulation results show that the calculation accuracy and stability of the algorithm
are improved, and the effectiveness of the improved algorithm is verified
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
10th International Conference on Artificial Intelligence and Soft Computing (...gerogepatton
10th International Conference on Artificial Intelligence and Soft Computing (AIS 2024) will
provide an excellent international forum for sharing knowledge and results in theory, methodology, and
applications of Artificial Intelligence, Soft Computing. The Conference looks for significant
contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical
aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from
both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
Employee attrition refers to the decrease in staff numbers within an organization due to various reasons.
As it has a negative impact on long-term growth objectives and workplace productivity, firms have
recognized it as a significant concern. To address this issue, organizations are increasingly turning to
machine-learning approaches to forecast employee attrition rates. This topic has gained significant
attention from researchers, especially in recent times. Several studies have applied various machinelearning methods to predict employee attrition, producing different resultsdepending on the employed
methods, factors, and datasets. However, there has been no comprehensive comparative review of multiple
studies applying machine-learning models to predict employee attrition to date. Therefore, this study aims
to fill this gap by providing an overview of research conducted on applying machine learning to predict
employee attrition from 2019 to February 2024. A literature review of relevant studies was conducted,
summarized, and classified. Most studies agree on conducting comparative experiments with multiple
predictive models to determine the most effective one.From this literature survey, the RF algorithm and
XGB ensemble method are repeatedly the best-performing, outperforming many other algorithms.
Additionally, the application of deep learning to employee attrition prediction issues also shows promise.
While there are discrepancies in the datasets used in previous studies, it is notable that the dataset
provided by IBM is the most widely utilized. This study serves as a concise review for new researchers,
facilitating their understanding of the primary techniques employed in predicting employee attrition and
highlighting recent research trends in this field. Furthermore, it provides organizations with insight into
the prominent factors affecting employee attrition, as identified by studies, enabling them to implement
solutions aimed at reducing attrition rates.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AIFU 2024) is a forum for presenting new advances and research results in the fields of Artificial Intelligence. The conference will bring together leading researchers, engineers and scientists in the domain of interest from around the world. The scope of the conference covers all theoretical and practical aspects of the Artificial Intelligence.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...gerogepatton
This paper summarizes the most cogent advantages and risks associated with Artificial Intelligence from an
in-depth review of the literature. Then the authors synthesize the salient risk-related models currently being
used in AI, technology and business-related scenarios. Next, in view of an updated context of AI along with
theories and models reviewed and expanded constructs, the writers propose a new framework called “The
Transformation Risk-Benefit Model of Artificial Intelligence” to address the increasing fears and levels of
AIrisk. Using the model characteristics, the article emphasizes practical and innovative solutions where
benefitsoutweigh risks and three use cases in healthcare, climate change/environment and cyber security to
illustrate unique interplay of principles, dimensions and processes of this powerful AI transformational
model.
13th International Conference on Software Engineering and Applications (SEA 2...gerogepatton
13th International Conference on Software Engineering and Applications (SEA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Software Engineering and Applications. The goal of this conference is to bring together researchers and practitioners from academia and industry to focus on understanding Modern software engineering concepts and establishing new collaborations in these areas.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONgerogepatton
Complicated policy texts require a lot of effort to read, so there is a need for intelligent interpretation of
Chinese policies. To better solve the Chinese Text Summarization task, this paper utilized the mT5 model
as the core framework and initial weights. Additionally, In addition, this paper reduced the model size
through parameter clipping, used the Gap Sentence Generation (GSG) method as unsupervised method,
and improved the Chinese tokenizer. After training on a meticulously processed 30GB Chinese training
corpus, the paper developed the enhanced mT5-GSG model. Then, when fine-tuning the Chinese Policy
text, this paper chose the idea of “Dropout Twice”, and innovatively combined the probability distribution
of the two Dropouts through the Wasserstein distance. Experimental results indicate that the proposed
model achieved Rouge-1, Rouge-2, and Rouge-L scores of 56.13%, 45.76%, and 56.41% respectively on
the Chinese policy text summarization dataset.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASED ON TABU SEARCH FOR PARKINSON’S DISEASE MINING
1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
DOI: 10.5121/ijaia.2020.11201 1
A MODIFIED MAXIMUM RELEVANCE MINIMUM
REDUNDANCY FEATURE SELECTION
METHOD BASED ON TABU SEARCH FOR
PARKINSON’S DISEASE MINING
Waheeda Almayyan
Computer Information Department, Collage of Business Studies, PAAET, Kuwait
ABSTRACT
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of
the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients
usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent
research trends aim to investigate the potential of using sustained vowel phonations for replicating the
speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the
accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis
model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum
Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the
features numbers. Secondly, we apply simple random sampling technique to dataset to increase the
samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate
of 95% with 24 features by 10-fold CV method.
KEYWORDS
Parkinson’s disease; Medical data mining; maximum Relevance Minimum Redundancy, Tabu Search&
Simple Random Sampling.
1. INTRODUCTION
Parkinson’s disease (PD) is a complex progressive, degenerative universal disorder of the central
nervous system. PD is the second most common neurodegenerative disorder in elderly.
According to the World Health Organization (WHO), it was estimated that there are about 7-10
million PD patients in the world [1]. Two percent of the population above the age of 60 is
affected by the PD. It ultimately leads to several motor and non-motor manifestations. The cause
of PD is still unknown and early signs may be mild and unnoticeable, the diagnoses of PD disease
mainly relies on clinical criteria [2]. Research has shown that approximately 90 percent of PD
patients face some form of vocal deficiency [3].
Classification systems play a major role in machine learning and knowledge-based systems. In
the last 30 years Computer-Aided Diagnosis (CAD) has become one of the vital research topics
in medical diagnostic tasks. Medical practitioners depend upon the experience beside the existing
information, whereas, CAD usually apply intelligent machine learning techniques to help
physicians in making decisions. A number of published articles suggested several strategies to
process the physician’s interpretation and decisions [4]. During classification process, accuracy
2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
2
of clinically diagnosed cases is particularly important issue to be considered. But due to the
voluminous and heterogeneous nature of medical data: classification techniques are required to
make accurate decisions. The size of most medical datasets is normally large-scaled; accordingly,
this matter effects the complexity of the data mining process [5].Therefore, dimension reduction
techniques play a major role in excluding irrelevant features from medical datasets [6]. Therefore,
instead of using all existing features, dimension reduction procedure aims to reduce
computational complexity with the possible advantages of enhancing the overall classification
performance and eventually reduce the computational cost.
This article proposes a two-stage feature selection algorithm to assess the PD habitation speech
treatment based on Tabu and minimum redundancy - maximum relevance (mRMR) search
techniques. Firstly, we select the first set of features by applying Tabu Search to construct a first
reductive feature set. Secondly, we apply a modified mRMR feature selection approach, based on
Cuckoo Search to further reduce the dimension and to discover the important relationship
between features. Finally, we apply the random sampling technique to handle the imbalanced
distribution of the medical data. This paper attempts to extend the work of [7] to deal with the
limited volume of data and improve the accuracy of the prediction event.
The reminder of this article is organized in the following manner; introduction and related
research are briefly described in sections 1 and 2. Section 3 explains the theoretical approach of
feature selection methods and the proposed technique. The dataset, evaluation procedure and the
experimental results are presented in sections 4 and 5. Finally, conclusion is summarized in
Section 6.
2. RELATED WORK
Several studies have been carried out by numerous researchers on the early diagnosis of PD based
on machine learning methods [8-20]. Little and his co-researchers conducted an important study
to detect PD by introducing a new measure of dysphonia [8]. This investigation has presented a
new dysphonia measure, known as the pitch period entropy. They employed Support Vector
Machine (SVM) classifier with Gaussian radial basis kernel functions to predict PD and
performed feature selection to select the optimal subset of features. Their best overall accuracy
rate was 91.4%. In 2010, Das compare Neural Networks (ANN), DMneural, Regression and
Decision Tree to predict PD. The results have shown that the ANN classifier approach
outperformed the other two models with an overall accuracy of 92.9% [9]. Authors in [10]
introduced a nonlinear model based on Dirichlet mixtures. The experimental results showed that
the suggested model outperformed the multinomial logit models, decision trees and SVM with an
overall accuracy of 87.7%.
Sakar and Kursun (2010) selected a minimal subset of dysphonia features according to the
Maximum Relevance Minimum Redundancy (mRMR) criterion. An accuracy of 92.75% with the
SVM classifier was obtained [11]. Psorakis et al. (2010) introduced sample selection strategies,
novel convergence measures and model improvements for multiclass multi-kernel relevance
vector machines (mRVMs), and finally, the improved mRVMs achieved the classification
accuracy rate of 89.47% [12]. Researchers in [13] built a classification model based on genetic
programming and the expectation maximization algorithm to detect PD. According to the
experimental results the best classification accuracy obtained during the experiments was 93.1%.
Luukka (2011) suggested a feature selection model based on fuzzy entropy measures together
with the similarity classifier to predict PD and an additional three medical datasets. The mean
classification accuracy of predicting PD was 85.03% with only two features from original 22
features [14].
3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
3
Authors in [15] achieved an accuracy of 93.47% with a model based on non-linear fuzzy
transformation method in combination with the principal component analysis together with SVM
classifier. Ozcift and Gulten (2011) started with reducing feature set with the correlation-based
feature selection (CFS) algorithm. Then, they applied the classification of the rotation forest
ensemble classifiers that composed of 30 machine-learning algorithms to achieve an average
accuracy of 87.13% [16]. For the prediction of PD, AStröm and Koker (2011) proposed 9 parallel
feed-forward NN structure. The output of each neural network was evaluated using a rule-based
system to make final decision. Moreover, in the proposed approach unlearned data of each neural
network during the training process is collected and used in the training set of the next neural
network. The highest reported classification accuracy was 91.20% [17]. Researchers in [18]
applied evolutionary-based techniques in combination with the Optimum-Path Forest classifier to
reduce the number of features before detecting PD. The best classification accuracy of 84.01%
was reported using Gravitational Search Algorithm technique with eight features. Sriram et al.
(2015) deployed Sieve multigram data and Survey graph to obtain the statistical analysis on the
voice data. Then they performed clustering using the KStar and NNge classifiers [19]. Recently,
with the help of data mining and bioinformatics analysis, several potential therapeutic drugs that
may be used for prevention and treatment of Parkinson’s disease were discovered [20]. This
certainly open new perspectives for drugs development using AI techniques.
From this quick review, we can notice that most of the well-known classifiers models from
machine learning have been utilized to improve the technological assessments of PD. So,
choosing an excellent classifier is of significant importance to the tackle the PD classification
problem. Also, we have noticed that much work is needed to tackle the imbalanced nature of
medical datasets. Therefore, we will study the impact of the data pre-processing strategy that uses
data sampling after applying several attribute selection techniques, on the classification models
constructed. Aiming at improving the efficiency and effectiveness of the classification accuracy
for PD diagnosis with minimum number of features, in this article, a classification system based
on two-level of feature extraction is introduced. In this work we will focus on choosing the
appropriate features to describe the feature set based on Random Forest as a base classifier.
3. THE PROPOSED APPROACH
Experts address the rapidly rising number of deaths from PD. New study predicted that by the
year 2030 there will be around 1.2 million people living with PD in the United States [21]. A
practically collected LSVT dataset [8] is used in this article to demonstrate the proposed
procedure. More details on the dataset will be is introduced in the next section. The proposed
method for the purpose of diagnosis of PD applied in this study is illustrated in Figure 1. The
main objective of the proposed approach is to explore the performance of PD diagnosis using a
two-stage hybrid modelling procedure through integrating mRMR with Tabu Search technique.
Firstly, the proposed method adopts mRMR to construct the discriminative feature space through
Cuckoo search algorithm, then the Tabu technique is applied to help in improving the
discrimination ability of classifiers. In the last step the resulted feature space is processed by the
random resampling technique and ready to be evaluated with different types of activation
functions to perform the classification.
Eventually, five machine-learning classification methods, which are considered very robust in
solving non-linear problems, are chosen to estimate the PD possibility. These methods include
Random Forest (RF), Fuzzy Unordered Rule Induction Algorithm (FURIA), AdaBoost, Rough
sets and C4.5. In order to evaluate the prediction performance of the suggested model, we used
six performance metrics, Sensitivity, Specificity, Accuracy, Precision, 𝑓-measure, and MCC, to
test the classification performance.
4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
4
Figure.1. The proposed methodology
3.1. mRMR feature selection method based on Cuckoo search
The mRMR method helps in extracting crucial features and this can minimize the classification
error [22]. The mRMR is a heuristic technique approach proposed by Peng et al. to measure the
relevancy and redundancy of features and determine the informative features [23]. Several
experimental results denoted that mRMR is an effective method to improve the performance of
feature selection [24,25]. The mRMR feature selection technique is meant to identify a set of
crucial features that have maximum relevancy for target classes and minimum redundancy with
other features in the dataset at the same time.
The basic concept of mRMR method is to use two mutual information MI operations: one
between classes and each feature in order to measure the relevancy, while the second mutual
information between every two features to evaluate the redundancy. 𝑆 denotes the selected
features and Rl measures the relevancy of a group of selected features 𝑆 that can be defined as
follows
𝑅𝑙 =
1
|𝑆|
∑ 𝐼( 𝐺𝑥, 𝐶)𝐺 𝑥∈𝑆 1
where 𝐼(𝐺𝑥,𝐶) denoted the value of mutual information between an individual feature 𝐺𝑥 that
belongs to 𝑆 and the target class 𝐶. When the selected features have the maximum relevance Rl
value, it is possible to have high redundancy between these features. Hence, the redundancy Rd
of a group of selected features 𝑆 is defined as
𝑅𝑑 =
1
|𝑆|2
∑ 𝐼(𝐺 𝑥, 𝐺 𝑦)𝐺 𝑥,𝐺 𝑦∈𝑆 2
Where |S| is the number of feature in S and I(𝐺𝑥 ,𝐺𝑦) is the mutual information between the 𝑥th
and 𝑦th features that measures the mutual dependency of these two features.
LSVTDataset
1st Feature Selection
Procedure
2nd Feature Selection
Procedure
Resample Data
Classification
Procedure
Performance
Measures
Sensitivity
Specificity
Precision
Accuracy
MCC
𝑓-measure
RF
FURIA
Ada Boost
C4.5
Rough Sets
Random Resample
mRMR
Tabu
5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
5
The main purpose of applying the mRMR feature selection method is to find a subset of features
from 𝑆 with 𝑚 features, { 𝑥𝑖}, that either jointly have the largest dependency on the target class C
or have the minimal redundancy on the selected features subset 𝑆. Thus, Peng et al. [23]
recommend searching for balanced solutions through the composite objective. This criterion
combines the two criteria, which are maximal relevance criterion and minimal redundancy
criterion, as follows
max (Rl, Rd) = Rl – Rd 3
Our goal is to increase the prediction accuracy and reduce the number of selected features.
Thereafter, features are selected one by one by applying Cuckoo search to maximize the objective
function, which is a function of relevance and redundancy. Hence, we applied the mRMR method
to filter irrelevant and noisy features and eventually reduces the computational load.
3.2. Tabu Search
Tabu Search is a memory-based metaheuristic algorithm proposed by Glover in 1986 to solve
combinatorial optimization problems [26,27]. Since then, Tabu Search has been successfully
applied in other feature selection problems [28]. This search technique is a local neighborhood
search algorithm that simulates the optimal characteristics of human memory functions. Tabu
Search involves a local search combined with a tabu mechanism. It starts with an initial feasible
solution X’ among the neighborhood solutions, where is the set of feasible solutions, and at
each iteration, the algorithm searches the neighborhood of the best solution N(X) to obtain a
new one with an improved functional value. A solution X’ N(X) can be reached from X in
two cases, X’ is not included in the tabu list; and X’ is included in the tabu list, but it satisfies the
aspiration criterion [29]. Surely, if the new solution 𝑋’ is superior to 𝑋best, the value of 𝑋best is
overridden. To avoid cycling, solutions that were previously visited are declared forbidden or
tabu for a certain number of iterations and this surely improve the performance of the local
search. Then, the neighborhood search is resumed based on the new feasible solution 𝑋’. This
whole procedure is iteratively executed until the stopping criteria is met. After the iterative
process has terminated, the current best solution so far 𝑋best is considered the final optimal
solution provided by the Tabu Search method [30].
Begin
Start with an initial feasible solution X
Initialize tabu list and aspiration level
For fixed number of iterations Do
Generate neighbour solutions
Find best 𝑋best
If move from X to X’ is not in tabu list Then
Accept move and update solution
Update tabu list and aspiration level
Increment iteration number
Else
If Cost(X’)< aspiration level
Accept move and update best solution
Update tabu list and aspiration level
Increment iteration number
End If
End If
End For
End
Pseudocode 1: Pseudocode of of Tabu Search (TS) (Sait and Youssef, 1999[31]).
6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
6
3.3. Simple Random Sampling
Class imbalance problems occurs when one class is represented by a significantly larger number
of instances than other classes. Consequently, classification algorithms tend to ignore the
minority classes. Simple random sampling has been recommended as a method to increase the
sensitivity of the classification algorithm to the minority class by scaling the class distribution
[32,33]. Changing the class distribution can be conducted via different resampling strategies.
However, the simple random sampling technique has gained extra attention. The advantage of a
such technique is that it is external and therefore, easily transportable as well as very simple to
implement [34]. Moreover, over-sampling the minority class data avoids unnecessary information
loss [35].
Weiss and Provost conducted an empirical study where the authors used twenty datasets from
UCI repository has showed quantitatively that classifier accuracy might be increased with a
progressive sampling algorithm [32]. They used decision trees to evaluate classification
performances with the use of a sampling strategy. Another important study used simple random
sampling to scale the class distribution of biomedical datasets [33]. The authors measure the
effect of the suggested sampling strategy by the use of nearest neighbour and decision tree
classifiers. In simple random sampling, a sample is randomly selected from the population so that
the obtained sample is representative of the population. Therefore, this technique provides an
unbiased sample from the original data.
Regarding simple random sampling there are two approaches while making random selection, in
the first approach the samples are selected with replacement where the sample can be selected
more than once repeatedly with an equal selection chance. In the other approach the selection of
samples is done without replacement where the sample can be selected only once, so that each
sample in the dataset has an equal chance of being selected and once selected it cannot be chosen
again [36].
4. DATASET AND EVALUATION PROCEDURE
4.1. LSVT Dataset
Tracking the early Parkinson's disease symptom progression using the speech disorders has
shown a great sign in the advancement of Parkinson Disease detection. About 90% of people with
Parkinson’s disease present some kind of vocal deterioration. In this article, we use the dataset
originally collected by Tsanas et al. [7] to analyze the impact of LSVT (Lee Silverman Voice
Treatment) in the voice rehabilitation of 14 patients with PD. LSVT is an intensive speech
therapy program designed to improve respiratory, laryngeal, and articulatory functions during
speech in patients with Parkinson's disease [37]. Tsanas et al. measured 309 dysphonia features to
evaluate whether a sustained phonation is “acceptable” or “unacceptable” according to the
clinical criteria of six experts. Among the 126 phonations, 42 is labeled as “acceptable” while the
remaining 84 is labeled as “unacceptable”. The authors reported a classification score of 90%
considering a voting scheme. RF and SVM.
Table 1 describes the class distribution, which clearly shows that the dataset is imbalanced. Out
of 84phonations, 66.67% classified as unacceptable. A common problem with the imbalanced
data is that the minority class contributes very little to the standard algorithm accuracy.
7. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
7
Table 1.Class distribution of the LSVT dataset
Index Class name Class size Class distribution
(%)
1 acceptable 42 33.33
2 unacceptable 84 66.67
4.2. Performance Analysis
When learning of imbalanced data, the measures such as Accuracy and Error Rate usually favors
the majority class [38]. By using the 10-fold cross validation procedure, the accuracy of the
selected classifiers was calculated using sensitivity, specificity, accuracy, precision, 𝑓-measure
and MCC which are more appropriate measures for imbalanced datasets [39]. The main
formulations are defined in Equations 4-10, according to the confusion matrix, which is shown in
Table 2. In the confusion matrix of a two-class problem, TP is the number of true positives that
represent in our case the cases with Parkinson’s disease that was classified correctly. FN is the
number of false negatives that represents the cases with Parkinson’s disease that was classified
incorrectly as healthy. TN is the number of true negatives, which represents the healthy cases that
was classified as healthy. FP is the number of false positives that represents the healthy cases that
was classified as Parkinson’s disease cases.
Table 2. The confusion matrix
Hypothesis Predicted patient state
Classified case with
Parkinson’s disease
Classified healthy case
Hypothesis positive
Parkinson’s disease case
True Positive
TP
False Negative
FN
Hypothesis negative Healthy
case
False Positive
FP
True Negative
TN
The ROC (Receiver Operating Characteristic) curve is a graphical representation of the
sensitivity versus the specificity index for a classifier varying the discrimination threshold value.
The ROC curve is a standard tool for summarizing classifier performance over a range of trade-
offs between TP and FP error rates [40]. ROC usually takes values between 0.5 for random
drawing and 1.0 for perfect classifier performance.
Accuracy=
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁
4
The 𝑓-measure is therefore used as it measures the harmonic mean of the classifier’s precision
and Recall as
𝑓-measure =2 ×
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
5
Accordingly, we can define Precision and Recall as
Precision=
𝑇𝑁
𝐹𝑃+𝑇𝑁
6
Recall=
𝑇𝑃
𝑇𝑃+𝐹𝑁
7
8. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
8
Considering class-imbalanced datasets, the Matthews Correlation Coefficient (MCC) is an
appropriate measure that considered balanced. It can be used even if the classes are of very
different in sizes, as it is a correlation coefficient between the observed and predicted
classification decisions. The MCC measure falls within the range of [−1,1]. The larger the MCC
coefficient indicates better classifier prediction. The MCC measure can be calculated directly
from the confusion matrix using the following formula
MCC=
𝑇𝑃.𝑇𝑁−𝐹𝑁.𝐹𝑃
√(𝑇𝑃+𝐹𝑁)(𝑇𝑃+𝐹𝑃)(𝑇𝑁+𝐹𝑁)(𝑇𝑁+𝐹𝑃)
8
Sensitivity and specificity measures can be also used to improve interpretability as follows
Sensitivity=
𝑇𝑃
𝑇𝑃+𝐹𝑁
9
Specificity=
𝑇𝑁
𝑇𝑁+𝐹𝑃
10
5. RESULTS AND DISCUSSION
One of the interesting aspects in biomedical data mining is to build computational models with
abilities to extract hidden knowledge using data mining schemes. The number of trees for RF
algorithm and decision trees for Adaboost was arbitrarily set to 100, since it has been shown that
the optimal number of trees is usually 64–128, while further increasing the number of trees does
not necessarily improve the model’s performance [41]. The MINNO (minimum total weight of
the instances in a rule) of the FURIA algorithm has been set to 2.0. The performance results are
the average after 10 runs of 10 folds-validation on each classifier to obtain its prediction
measures. The suggested model for the purpose of evaluating LSVT data applied in this study is
carried out in two major phases. In order to verify the effectiveness of the proposed model, firstly
we compare with the performance of the selected classifier on the original feature space. Table 3,
shows the classification results on the original feature space. It can be observed that the
classification Accuracy fluctuates between 75% and 84% with the whole feature set. RF has
achieved the results of 94%, 64.3%, 84%, 88.8% and 66.8% in terms of Sensitivity, Specificity,
Precision, 𝑓-measure and MCC.
Table 3. Performance measures on the original features of LSVT dataset
Classifier Sensitivi
ty
Specifici
ty
Accurac
y
Precision f-
measure
MCC
RF 0.940 0.643 0.841 0.840 0.888 0.668
FURIA 0.940 0.619 0.833 0.832 0.883 0.651
RoughSet 0.917 0.714 0.849 0.865 0.890 0.672
AdaBoost 0.833 0.595 0.754 0.805 0.819 0.445
C4.5 0.845 0.571 0.754 0.798 0.821 0.444
9. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
9
Table 4. Selected features of LSVT dataset
FS technique Number
of selected
features
Selected features labels
1. Tabu 37 Jitter->F0_TKEO_prc75
Jitter->pitch_TKEO_std
Shimmer-
>Ampl_TKEO_mean
Shimmer-
>Ampl_TKEO_prc5
Shimmer-
>Ampl_TKEO_prc25
Shimmer-
>Ampl_TKEO_prc95
OQ->prc5_95
DFA
GNE->NSR_TKEO
GNE->NSR_SEO
VFER->mean
IMF->SNR_entropy
IMF->NSR_SEO
IMF->NSR_entropy
MFCC_2nd coef
MFCC_3rd coef
MFCC_6th coef
MFCC_7th coef
MFCC_8th coef
MFCC_10th coef
1st delta
2nd delta
5th delta-delta
Ea
Ed_8_coef
Ed_9_coef
Ed2_9_coef
entropy_shannon4_9_coef
entropy_log4_6_coef
entropy_log4_7_coef
entropy_log4_8_coef
det_TKEO_mean4_10_coef
det_TKEO_std4_3_coef
det_TKEO_std4_4_coef
det_TKEO_std4_5_coef
det_TKEO_std4_6_coef
det_TKEO_std4_9_coef
2. Tabu-
mRMR
8 Jitter->F0_TKEO_prc75
Shimmer-
>Ampl_TKEO_prc25
DFA
IMF->NSR_SEO
MFCC_2nd coef
MFCC_3rd coef
Ea
Ed_8_coef
Next, this reduced dataset is presented to the proposed approach that further optimizes the
dimensions of the data and finds an optimal set of features. At the end of this step a subset of
features is chosen for the next round. The optimal features by the Tabu and mRMR techniques
are listed in Table 4. It is worth noting that the number of features has remarkably reduced,
compared with mRMR technique, therefore less storage space is required for the execution of the
classification algorithms. In this phase we reduced the size of LSVT features from 309 to only 8
features. Table 5, shows the classification results of the first step. It can be observed that the
classification Accuracy using Tabu technique varies between 76% and 82% with the selected 37
10. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
10
feature set. In this step RF has achieved the results of 92.9%, 61.9%, 83%, 87.6% and 62.7%, in
terms of Sensitivity, Specificity, Precision, 𝑓-measure and MCC.
Table 5. Performance measures after first RF
Classifier Sensitivit
y
Specificit
y
Accuracy Precisio
n
f-
measure
MCC
RF 0.929 0.619 0.825 0.830 0.876 0.627
FURIA 0.893 0.619 0.802 0.824 0.857 0.561
RoughSet 0.821 0.643 0.762 0.821 0.821 0.464
AdaBoost 0.857 0.714 0.810 0.857 0.857 0.571
C4.5 0.881 0.632 0.803 0.841 0.860 0.542
In the next step, mRMR technique is applied on the selected features and the resulted feature set
is used as the inputs to the classifiers. In Table 6, we depict the performance of the classifiers
after applying the second step of reduction. From this table it is noticed that the highest Accuracy
rate is associated with RF classifier was 84.1% with 8 features. In Table 7, we can see the
comparative results of the classification performance of the second phase that deploy mRMR
algorithm to detect the most significant features. Clearly, we can observe that the mRMR helped
in reducing the dimension of features. Yet, this step did not improve the overall classification
performance.
Table 6. Performance measures of second stage RF
Classifier Sensitivit
y
Specificit
y
Accuracy Precisio
n
f-
measure
MCC
RF 0.929 0.667 0.841 0.848 0.886 0.662
FURIA 0.857 0.714 0.810 0.857 0.857 0.571
RoughSet 0.833 0.595 0.754 0.805 0.819 0.445
AdaBoost 0.833 0.643 0.770 0.824 0.828 0.482
C4.5 0.893 0.667 0.817 0.843 0.867 0.596
Tables 7 presents the last classification results of the suggested technique for mining the LSVT
dataset. The random resampling technique is applied on the second reduced dataset to increase
the samples of the minority class without replacement. The training set was resized at an over-
sampling rate of 100%, in order to balance the number of instances in the two classes. This step
should lead to make a more distinct and balanced dataset. We can observe that the highest
Accuracy rate is associated with RF classifier is 95.2% with 8 features. In this step RF has
achieved the results of 96.4%, 92.9%, 96.4%, 96.4% and 89.3%, in terms of Sensitivity,
Specificity, Precision, 𝑓-measure and MCC.
The results demonstrated that the reduced features are fairly sufficient to represent the dataset’s
class information. In terms of performance measures our proposed technique succeeded in
significantly improving the classification accuracy of the minority while the classification
accuracy of major class remains high. The outcomes from the suggested two-level attribute
selection techniques show better results compared to datasets which are not pre-processed and
also when these attribute selection techniques are used independently. It is clear now that our
developed method obtained promising classification results compared to the previously published
results in [7] which scored an average Accuracy of 90% using the original dataset.
11. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
11
Table 7. RF versus other machine learning techniques in predicting LSVT with resampling pre-processing
step
Classifier Sensitivi
ty
Specificity Accuracy Precisio
n
f-
measure
MCC
RF 0.964 0.929 0.952 0.964 0.964 0.893
FURIA 0.905 0.905 0.905 0.950 0.927 0.774
RoughSet 0.917 0.905 0.913 0.951 0.933 0.794
AdaBoost 0.881 0.738 0.833 0.871 0.876 0.627
C4.5 0.917 0.881 0.905 0.939 0.928 0.779
According to figure 2, the hybrid mRMR/Tabu technique has helped in investigating the LSVT
dataset. Starting from the raw dataset, we applied mRMR feature reduction technique to reduce
the features numbers. And in the second stage we applied Tabu Search technique to identify the
significant features. This helped in reducing the dataset from 309 to 8 features. Then these
features we fed into five classification algorithms. The RF algorithm performed scores were
notably higher in the two phases.
Figure 2. Comparison of proposed model based on accuracy
6. CONCLUSION
In this article, we have investigated a two-phase technique to improve the assessment of LSVT
dataset. We concluded that the proposed algorithm could improve the accuracy performance and
achieve promising results with fewer features that in the original dataset. The experiments have
shown that the Tabu and mRMR based RF classification strategy helped in reducing the feature
space. Whilst applying simple random sampling technique helped in adjusting the region area of
the minority class in favour of handling the existing imbalanced data property.
REFERENCES
[ 1 ] Aarli, J.A.; Dua, T.; Janca, A.; Muscetta, A. Neurological Disorders: Public Health Challenges.
World Health Organization: Geneva, Switzerland, 2006.
[ 2 ] De Lau, L.M.; Breteler, M.M. Epidemiology of parkinsonʼs disease. Lancet Neurol. 2006, 5, 525–
535.
[ 3 ] Ho, A. K., Iansek, R., Marigliani, C., Bradshaw, J. L., & Gates, S. (1998). Speech impairment in a
large sample of patients with Parkinson’s disease. Behavioural Neurology, 11, 131–138.
12. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
12
[ 4 ] Ciosa, K.J. and Mooree, G.W. (2002) Uniqueness of medical data mining,Artif. Intell. Med. 26 1–
24.
[ 5 ] Huang, S.H. Wulsin, L.R. Li, H. and Guo, J. (2009) Dimensionality reduction for knowledge
discovery in medical claims database: application to antidepressant medication utilization study,
Comput. Methods Programs Biomed. 93, 115–123.
[ 6 ] Luukka, P. (2011) Feature selection using fuzzy entropy measures with similarity classifier,
Expert Systems with Applications, Volume 38, Issue 4, 4600-4607, ISSN 0957-4174,
http://dx.doi.org/10.1016/j.eswa.2010.09.133.
[ 7 ] A. Tsanas, M.A. Little, C. Fox, L.O. Ramig: Objective automatic assessment of rehabilitative
speech treatment in Parkinson’s disease, IEEE Transactions on Neural Systems and Rehabilitation
Engineering, Vol. 22, pp. 181-190, January 2014.
[ 8 ] M. A. Little, P. E. McSharry, E. J. Hunter, J. Spielman, and L. O. Ramig, “Suitability of
dysphonia measurements for telemonitoring of Parkinson’s disease,” IEEE Transactions on
Biomedical Engineering, vol. .56, no. 4, pp. 1015-1022, 2009.
[ 9 ] Das, R. (2010). A comparison of multiple classification methods for diagnosis of Parkinson
disease. Expert Systems with Applications, 37, 1568–1572.
[ 10 ] Shahbaba, B., & Neal, R. (2009). Nonlinear models using Dirichlet process mixtures. The Journal
of Machine Learning Research, 10, 1829–1850.
[ 11 ] Sakar, C. Okan, and Olcay Kursun. "Telediagnosis of Parkinson’s disease using measurements of
dysphonia." Journal of medical systems 34.4 (2010): 591-599.
[ 12 ] Psorakis, I., Damoulas, T., & Girolami, M. A. (2010). Multiclass relevance vector machines:
sparsity and accuracy. Neural Networks, IEEE Transactions on, 21, 1588–1598.
[ 13 ] Guo, P. F., Bhattacharya, P., & Kharma, N. (2010). Advances in detecting Parkinson’s disease.
Medical Biometrics, 306–314.
[ 14 ] Luukka, P. (2011). Feature selection using fuzzy entropy measures with similarity classifier.
Expert Systems with Applications, 38, 4600–4607.
[ 15 ] Li, D. C., Liu, C. W., & Hu, S. C. (2011). A fuzzy-based data transformation for feature extraction
to increase classification performance with small medical data sets. Artificial Intelligence in
Medicine, 52, 45–52.
[ 16 ] Ozcift, A., & Gulten, A. (2011). Classifier ensemble construction with rotation forest to improve
medical diagnosis performance of machine learning algorithms. Comput Methods Programs
Biomed, 104, 443–451.
[ 17 ] AStröm, F., & Koker, R. (2011). A parallel neural network approach to prediction of Parkinson’s
Disease. Expert Systems with Applications, 38, 12470–12474.
[ 18 ] Spadoto, A. A., Guido, R. C., Carnevali, F. L., Pagnin, A. F., Falcao, A. X., & Papa, J. P. (2011).
Improving Parkinson’s disease identification through evolutionarybased feature selection. In
Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of
the IEEE (pp. 7857–7860).
[ 19 ] Sriram, TarigoppulaV.S., Rao, M.Venkateswara, Narayana, G.V.Satya & Kaladhar, D.S.V.G.K.
(2015). Diagnosis of Parkinson Disease Using Machine Learning and Data Mining Systems from
Voice Dataset. In Proceedings of the 3rd International Conference on Frontiers of Intelligent
Computing: Theory and Applications (FICTA) 2014. (pp. 151-157).
13. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
13
[ 20 ] Chuan Xu, Jiajun Chen, Xia Xu, Yingyu Zhang, and Jia Li, “Potential Therapeutic Drugs for
Parkinson’s Disease Based on Data Mining and Bioinformatics Analysis,” Parkinson’s Disease,
vol. 2018, Article ID 3464578, 8 pages, 2018. https://doi.org/10.1155/2018/3464578.
[ 21 ] Marras C, Beck JC, Bower JH, Roberts E, Ritz B, Ross GW, Abbott RD, Savica R, Van Den
Eeden SK, Willis AW, Tanner CM; Parkinson’s Foundation P4 Group. Prevalence of Parkinson's
disease across North America. NPJ Parkinsons Dis. 2018 Jul 10;4:21. doi: 10.1038/s41531-018-
0058-0. PMID: 30003140; PMCID: PMC6039505.
[ 22 ] C. Ding and H. Peng, “Minimum redundancy feature selection from microarray gene expression
data,” Journal of Bioinformatics and Computational Biology, vol. 3, no. 2, pp. 185–205, 2005.
[ 23 ] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-
dependency, max- relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005.
[ 24 ] El A, Aouatif A, El A, Driss O (2011) A two-stage gene selection scheme utilizing MRMR filter
and GA wrapper. Knowl Inf Syst 26:487–500
[ 25 ] Li A, Hu L, Niu S, Cai Y, Chou K (2012) Predict and analyze Snitrosylation modification sites
with the mRMR and IFS approaches. J Proteom 75:1654–1665
[ 26 ] F. Glover, “Tabu search—part I,” ORSA Journal on Computing, vol. 1, no. 3, pp. 190–206, 1989.
[ 27 ] F. Glover, “Tabu search—part II,” ORSA Journal on Computing, vol. 2, no. 1, pp. 4–32, 1990.
[ 28 ] Tahir, M.A., Bouridane, A., Kurugollu, F., Amira, A., 2004a. Feature Selection using Tabu Search
for Improving the Classification Rate of Prostate Needle Biopsies. In: Proc. 17th International
Conference on Pattern Recognition (ICPR 2004), Cambridge, UK.
[ 29 ] Tahir, M.A., Bouridane, A., Kurugollu, F., 2004b. Simulataneous Feature Selection and
Weigthing for Nearest Neighbor Using Tabu Search. In: Lecture Notes in Computer Science
(LNCS 3177), 5th International Conference on Intelligent Data Engineering and Automated
Learning (IDEAL 2004), Exeter, UK.
[ 30 ] Korycinski, D., Crawford, M., Barnes, J.W, and Ghosh, J., 2003. Adaptive feature selection for
hyperspectral data analysis using a binary hierarchical classifier and Tabu Search. In: Proceedings
of the IEEE International Geoscience and Remote Sensing Symposium, IGARSS.
[ 31 ] Sait, S.M., Youssef, H., 1999. General iterative algorithms for combinatorial optimization. IEEE
Computer Society.
[ 32 ] G. Weiss and F. Provost, "Learning when Training Data are Costly: The Effect of Class
Distribution on Tree Induction," J. Artificial Intelligence Research, vol.19,315-354,2003.
[ 33 ] Park, B.-H., Ostrouchov, G., Samatova, N.F., Geist, A.: Reservoir-based random sampling with
replacement from data stream. In: SDM 2004, 492-496, (2004)
[ 34 ] Estabrooks, A., Jo, T. & Japkowicz, N., 2004. A Multiple Resampling Method for Learning from
Imbalanced Data Sets. Computational Intelligence, 20(1), pp.18– 36.
[ 35 ] Yang, Y.Y. et al., 2011. Adaptive neural-fuzzy inference system for classification of rail quality
data with bootstrapping-based over-sampling. IEEE International Conference on Fuzzy Systems
(FUZZ-IEEE 2011), pp.2205–2212.
[ 36 ] Mitra SK and Pathak PK. The nature of simple random sampling. Ann. Statist., 1984, 12:1536-
1542.
14. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.2, March 2020
14
[ 37 ] Ramig L, Pawlas A, Countryman S. The Lee Silverman voice treatment (LSVT®): a practical
guide to treating the voice and speech disorders in Parkinson disease. Iowa City, IA: National
Center for Voice and Speech; 1995.
[ 38 ] F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing
induction algorithms, Proceedings of the Fifteenth International Conference on Machine Learning,
San Francisco, CA: Morgan Kaufmann, 445-453, 1998.
[ 39 ] M.A. Maloof, Learning when data sets are Imbalanced and when costs are unequal and unknown,
ICML-2003 Workshop on Learning from Imbalanced Data Sets II, 2003.
[ 40 ] Smialowski P, Doose G, Torkler P, Kaufmann S, Frishman D: PROSO II-a new method for
protein solubility prediction. FEBS J 2012, 279(12):2192-2200.
[ 41 ] Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a Random Forest? In: Machine
learning and data mining in pattern recognition. Springer, Berlin, pp 154–168 .