An Influence of Measurement Scale of Predictor Variable on Logistic Regression Modeling and Learning Vector Quntization Modeling for Object Classification
Much real world decision making is based on binary categories of information that agree or disagree, accept or reject, succeed or fail and so on. Information of this category is the output of a classification method that is the domain of statistical field studies (eg Logistic Regression method) and machine learning (eg Learning Vector Quantization (LVQ)). The input argument of a classification method has a very crucial role to the resulting output condition. This paper investigated the influence of various types of input data measurement (interval, ratio, and nominal) to the performance of logistic regression method and LVQ in classifying an object. Logistic regression modeling is done in several stages until a model that meets the suitability model test is obtained. Modeling on LVQ was tested on several codebook sizes and selected the most optimal LVQ model. The best model of each method compared to its performance on object classification based on Hit Ratio indicator. In logistic regression model obtained 2 models that meet the model suitability test is a model with predictive variables scaled interval and nominal, while in LVQ modeling obtained 3 pieces of the most optimal model with a different codebook. In the data with interval-scale predictor variable, the performance of both methods is the same. The performance of both models is just as bad when the data have the predictor variables of the nominal scale. In the data with predictor variable has ratio scale, the LVQ method able to produce moderate enough performance, while on logistic regression modeling is not obtained the model that meet model suitability test. Thus if the input dataset has interval or ratio-scale predictor variables than it is preferable to use the LVQ method for modeling the object classification.
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
In this paper, different classification algorithms for data mining are discussed. Data Mining is about
explaining the past & predicting the future by means of data analysis. Classification is a task of data mining,
which categories data based on numerical or categorical variables. To classify the data many algorithms are
proposed, out of them five algorithms are comparatively studied for data mining through classification. There are
four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions &
Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors,
Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark
datasets like Iris & Lung Cancer.
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMSijcsit
Diabetes disease is amongst the most common disease in India. It affects patient’s health and also leads to
other chronic diseases. Prediction of diabetes plays a significant role in saving of life and cost. Predicting
diabetes in human body is a challenging task because it depends on several factors. Few studies have reported the performance of classification algorithms in terms of accuracy. Results in these studies are difficult and complex to understand by medical practitioner and also lack in terms of visual aids as they arepresented in pure text format. This reported survey uses ROC and PRC graphical measures toimproveunderstanding of results. A detailed parameter wise discussion of comparison is also presented which lacksin other reported surveys. Execution time, Accuracy, TP Rate, FP Rate, Precision, Recall, F Measureparameters are used for comparative analysis and Confusion Matrix is prepared for quick review of each
algorithm. Ten fold cross validation method is used for estimation of prediction model. Different sets of
classification algorithms are analyzed on diabetes dataset acquired from UCI repository
Multimodal authentication is one of the prime concepts in current applications of real scenario. Various
approaches have been proposed in this aspect. In this paper, an intuitive strategy is proposed as a
framework for providing more secure key in biometric security aspect. Initially the features will be
extracted through PCA by SVD from the chosen biometric patterns, then using LU factorization technique
key components will be extracted, then selected with different key sizes and then combined the selected key
components using convolution kernel method (Exponential Kronecker Product - eKP) as Context-Sensitive
Exponent Associative Memory model (CSEAM). In the similar way, the verification process will be done
and then verified with the measure MSE. This model would give better outcome when compared with SVD
factorization[1] as feature selection. The process will be computed for different key sizes and the results
will be presented.
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ijcsit
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
Accounting for variance in machine learning benchmarksDevansh16
Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...ijaia
Objects or structures that are regular take uniform dimensions. Based on the concepts of regular models,
our previous research work has developed a system of a regular ontology that models learning structures
in a multiagent system for uniform pre-assessments in a learning environment. This regular ontology has
led to the modelling of a classified rules learning algorithm that predicts the actual number of rules needed
for inductive learning processes and decision making in a multiagent system. But not all processes or
models are regular. Thus this paper presents a system of polynomial equation that can estimate and predict
the required number of rules of a non-regular ontology model given some defined parameters.
When deep learners change their mind learning dynamics for active learningDevansh16
Abstract:
Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
In this paper, different classification algorithms for data mining are discussed. Data Mining is about
explaining the past & predicting the future by means of data analysis. Classification is a task of data mining,
which categories data based on numerical or categorical variables. To classify the data many algorithms are
proposed, out of them five algorithms are comparatively studied for data mining through classification. There are
four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions &
Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors,
Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark
datasets like Iris & Lung Cancer.
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMSijcsit
Diabetes disease is amongst the most common disease in India. It affects patient’s health and also leads to
other chronic diseases. Prediction of diabetes plays a significant role in saving of life and cost. Predicting
diabetes in human body is a challenging task because it depends on several factors. Few studies have reported the performance of classification algorithms in terms of accuracy. Results in these studies are difficult and complex to understand by medical practitioner and also lack in terms of visual aids as they arepresented in pure text format. This reported survey uses ROC and PRC graphical measures toimproveunderstanding of results. A detailed parameter wise discussion of comparison is also presented which lacksin other reported surveys. Execution time, Accuracy, TP Rate, FP Rate, Precision, Recall, F Measureparameters are used for comparative analysis and Confusion Matrix is prepared for quick review of each
algorithm. Ten fold cross validation method is used for estimation of prediction model. Different sets of
classification algorithms are analyzed on diabetes dataset acquired from UCI repository
Multimodal authentication is one of the prime concepts in current applications of real scenario. Various
approaches have been proposed in this aspect. In this paper, an intuitive strategy is proposed as a
framework for providing more secure key in biometric security aspect. Initially the features will be
extracted through PCA by SVD from the chosen biometric patterns, then using LU factorization technique
key components will be extracted, then selected with different key sizes and then combined the selected key
components using convolution kernel method (Exponential Kronecker Product - eKP) as Context-Sensitive
Exponent Associative Memory model (CSEAM). In the similar way, the verification process will be done
and then verified with the measure MSE. This model would give better outcome when compared with SVD
factorization[1] as feature selection. The process will be computed for different key sizes and the results
will be presented.
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ijcsit
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
Accounting for variance in machine learning benchmarksDevansh16
Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...ijaia
Objects or structures that are regular take uniform dimensions. Based on the concepts of regular models,
our previous research work has developed a system of a regular ontology that models learning structures
in a multiagent system for uniform pre-assessments in a learning environment. This regular ontology has
led to the modelling of a classified rules learning algorithm that predicts the actual number of rules needed
for inductive learning processes and decision making in a multiagent system. But not all processes or
models are regular. Thus this paper presents a system of polynomial equation that can estimate and predict
the required number of rules of a non-regular ontology model given some defined parameters.
When deep learners change their mind learning dynamics for active learningDevansh16
Abstract:
Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
A Magnified Application of Deficient Data Using Bolzano Classifierjournal ijrtem
Abstract: Deficient and Inconsistent knowledge are a pervasive and lasting problem in heavy information sets. However, basic accusation is attractive often used to impute missing data, whereas multiple imputation generates right value to replace. Consequently, variety of machine learning (ML) techniques are developed to reprocess the incomplete information. It is estimated multiple imputation of missing information in large datasets and the performance of MI focuses on several unsupervised ML algorithms like mean, median, standard deviation and Supervised ML techniques for probabilistic algorithm like NBI classifier. Such research is carried out employing a comprehensive range of databases, for which missing cases are first filled in by various sets of plausible values to create multiple completed datasets, then standard complete- data operations are applied to each completed dataset, and finally the multiple sets of results combine to generate a single inference. The main aim of this report is to offer general guidelines for selection of suitable data imputation algorithms based on characteristics of the data. Implementing Bolzano Weierstrass theorem in machine learning techniques to assess the functioning of every sequence of rational and irrational number has a monotonic subsequence and specify every sequence always has a finite boundary. For estimate imputation of missing data, the standard machine learning repository dataset has been applied. Tentative effects manifest the proposed approach to have good accuracy and the accuracy measured in terms of percent. Keywords: Bolzano Classifier, Maximum Likelihood, NBI classifier, posterior probability, predictor probability, prior probability, Supervised ML, Unsupervised ML.
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...Devansh16
YouTube video: https://www.youtube.com/watch?v=Ao-19L0sLOI
SinGAN-Seg: Synthetic Training Data Generation for Medical Image Segmentation
Vajira Thambawita, Pegah Salehi, Sajad Amouei Sheshkal, Steven A. Hicks, Hugo L.Hammer, Sravanthi Parasa, Thomas de Lange, Pål Halvorsen, Michael A. Riegler
Processing medical data to find abnormalities is a time-consuming and costly task, requiring tremendous efforts from medical experts. Therefore, Ai has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. AI tools highly depend on data for training the models. However, there are several constraints to access to large amounts of medical data to train machine learning algorithms in the medical domain, e.g., due to privacy concerns and the costly, time-consuming medical data annotation process. To address this, in this paper we present a novel synthetic data generation pipeline called SinGAN-Seg to produce synthetic medical data with the corresponding annotated ground truth masks. We show that these synthetic data generation pipelines can be used as an alternative to bypass privacy concerns and as an alternative way to produce artificial segmentation datasets with corresponding ground truth masks to avoid the tedious medical data annotation process. As a proof of concept, we used an open polyp segmentation dataset. By training UNet++ using both the real polyp segmentation dataset and the corresponding synthetic dataset generated from the SinGAN-Seg pipeline, we show that the synthetic data can achieve a very close performance to the real data when the real segmentation datasets are large enough. In addition, we show that synthetic data generated from the SinGAN-Seg pipeline improving the performance of segmentation algorithms when the training dataset is very small. Since our SinGAN-Seg pipeline is applicable for any medical dataset, this pipeline can be used with any other segmentation datasets.
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2107.00471 [eess.IV]
(or arXiv:2107.00471v1 [eess.IV] for this version)
Reach out to me:
Check out my other articles on Medium. : https://machine-learning-made-simple....
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn: https://www.linkedin.com/in/devansh-d...
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
A survey of modified support vector machine using particle of swarm optimizat...Editor Jacotech
The main objective of this survey paper is to provide a detailed description of Wireless Sensor Networks with Medium Access Control layer and Routing layer. In the medium access control layer, Event Driven Time Division Multiple Access protocol is studied and in Network layer, two routing protocols Bellman-Ford and Dynamic Source Routing are studied.
Literature Survey: Clustering TechniqueEditor IJCATR
Clustering is a partition of data into the groups of similar or dissimilar objects. Clustering is unsupervised learning
technique helps to find out hidden patterns of Data Objects. These hidden patterns represent a data concept. Clustering is used in many
data mining applications for data analysis by finding data patterns. There is a number of clustering techniques and algorithms are
available to cluster the data object. According to the type of data object and structure appropriate clustering technique is selected. This
survey focuses on the clustering techniques for their input attribute data type, their input parameters and output. The main objective is
not to understand the actual working of clustering technique. Instead, the input data requirement and input parameters of clustering
technique are focused.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
Comparative study of classification algorithm for text based categorizationeSAT Journals
Abstract
Text categorization is a process in data mining which assigns predefined categories to free-text documents using machine
learning techniques. Any document in the form of text, image, music, etc. can be classified using some categorization techniques.
It provides conceptual views of the collected documents and has important applications in the real world. Text based
categorization is made use of for document classification with pattern recognition and machine learning. Advantages of a number
of classification algorithms have been studied in this paper to classify documents. An example of these algorithms is: Naive Bayes'
algorithm, K-Nearest Neighbor, Decision Tree etc. This paper presents a comparative study of advantages and disadvantages of
the above mentioned classification algorithm.
Keywords: Data Mining, Text Mining, Text Categorization, Machine Learning, Pattern Analysis, Naive Bayes’, KNN,
Decision Tree.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...IJCI JOURNAL
When only a few lower modes data are available to evaluate a large number of unknown parameters, it is
difficult to acquire information about all unknown parameters. The challenge in this kind of updation
problem is first to get confidence about the parameters that are evaluated correctly using the available
data and second to get information about the remaining parameters. In this work, the first issue is resolved
employing the sensitivity of the modal data used for updation. Once it is fixed that which parameters are
evaluated satisfactorily using the available modal data the remaining parameters are evaluated employing
modal data of a virtual structure. This virtual structure is created by adding or removing some known
stiffness to or from some of the stories of the original structure. A 12-story shear building is considered for
the numerical illustration of the approach. Results of the study show that the present approach is an
effective tool in system identification problem when only a few data is available for updation.
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
Data mining, the extraction of hidden information from large databases, is to predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. Data-Mining classification techniques deals with determining to which group each data instances are associated with. It can deal with a wide variety of data so that large amount of data can be involved in processing. This paper deals with analysis on various data mining classification techniques such as Decision Tree Induction, Naïve Bayes , k-Nearest Neighbour (KNN) classifiers in mammographic mass dataset.
Smooth Support Vector Machine for Suicide-Related Behaviours Prediction IJECEIAES
Suicide-related behaviours need to be prevented on psychiatric patients. Prediction of those behaviours based on patient medical records would be very useful for the prevention by the psychiatric hospital. This research focused on developing this prediction at the only one psychiatric hospital of Bali Province by using Smooth Support Vector Machine method, as the further development of Support Vector Machine. The method used 30.660 patient medical records from the last five years. Data cleaning gave 2665 relevant data for this research, includes 111 patients that have suicide-related behaviours and under active treatment. Those cleaned data then were transformed into ten predictor variables and a response variable. Splitting training and testing data on those transformed data were done for building and accuracy evaluation of the method model. Based on the experiment, the best average accuracy at 63% can be obtained by using 30% of relevant data as data testing and by using training data which has one-to-one ratio in number between patients that have suicide-related behaviours and patients that have no such behaviours. In the future work, accuracy improvement need to be confirmed by using Reduced Support Vector Machine method, as the further development of Smooth Support Vector Machine.
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
Clustering heterogeneous categorical data using enhanced mini batch K-means ...IJECEIAES
Clustering methods in data mining aim to group a set of patterns based on their similarity. In a data survey, heterogeneous information is established with various types of data scales like nominal, ordinal, binary, and Likert scales. A lack of treatment of heterogeneous data and information leads to loss of information and scanty decision-making. Although many similarity measures have been established, solutions for heterogeneous data in clustering are still lacking. The recent entropy distance measure seems to provide good results for the heterogeneous categorical data. However, it requires many experiments and evaluations. This article presents a proposed framework for heterogeneous categorical data solution using a mini batch k-means with entropy measure (MBKEM) which is to investigate the effectiveness of similarity measure in clustering method using heterogeneous categorical data. Secondary data from a public survey was used. The findings demonstrate the proposed framework has improved the clustering’s quality. MBKEM outperformed other clustering algorithms with the accuracy at 0.88, v-measure (VM) at 0.82, adjusted rand index (ARI) at 0.87, and Fowlkes-Mallow’s index (FMI) at 0.94. It is observed that the average minimum elapsed time-varying for cluster generation, k at 0.26 s. In the future, the proposed solution would be beneficial for improving the quality of clustering for heterogeneous categorical data problems in many domains.
A Magnified Application of Deficient Data Using Bolzano Classifierjournal ijrtem
Abstract: Deficient and Inconsistent knowledge are a pervasive and lasting problem in heavy information sets. However, basic accusation is attractive often used to impute missing data, whereas multiple imputation generates right value to replace. Consequently, variety of machine learning (ML) techniques are developed to reprocess the incomplete information. It is estimated multiple imputation of missing information in large datasets and the performance of MI focuses on several unsupervised ML algorithms like mean, median, standard deviation and Supervised ML techniques for probabilistic algorithm like NBI classifier. Such research is carried out employing a comprehensive range of databases, for which missing cases are first filled in by various sets of plausible values to create multiple completed datasets, then standard complete- data operations are applied to each completed dataset, and finally the multiple sets of results combine to generate a single inference. The main aim of this report is to offer general guidelines for selection of suitable data imputation algorithms based on characteristics of the data. Implementing Bolzano Weierstrass theorem in machine learning techniques to assess the functioning of every sequence of rational and irrational number has a monotonic subsequence and specify every sequence always has a finite boundary. For estimate imputation of missing data, the standard machine learning repository dataset has been applied. Tentative effects manifest the proposed approach to have good accuracy and the accuracy measured in terms of percent. Keywords: Bolzano Classifier, Maximum Likelihood, NBI classifier, posterior probability, predictor probability, prior probability, Supervised ML, Unsupervised ML.
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...Devansh16
YouTube video: https://www.youtube.com/watch?v=Ao-19L0sLOI
SinGAN-Seg: Synthetic Training Data Generation for Medical Image Segmentation
Vajira Thambawita, Pegah Salehi, Sajad Amouei Sheshkal, Steven A. Hicks, Hugo L.Hammer, Sravanthi Parasa, Thomas de Lange, Pål Halvorsen, Michael A. Riegler
Processing medical data to find abnormalities is a time-consuming and costly task, requiring tremendous efforts from medical experts. Therefore, Ai has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. AI tools highly depend on data for training the models. However, there are several constraints to access to large amounts of medical data to train machine learning algorithms in the medical domain, e.g., due to privacy concerns and the costly, time-consuming medical data annotation process. To address this, in this paper we present a novel synthetic data generation pipeline called SinGAN-Seg to produce synthetic medical data with the corresponding annotated ground truth masks. We show that these synthetic data generation pipelines can be used as an alternative to bypass privacy concerns and as an alternative way to produce artificial segmentation datasets with corresponding ground truth masks to avoid the tedious medical data annotation process. As a proof of concept, we used an open polyp segmentation dataset. By training UNet++ using both the real polyp segmentation dataset and the corresponding synthetic dataset generated from the SinGAN-Seg pipeline, we show that the synthetic data can achieve a very close performance to the real data when the real segmentation datasets are large enough. In addition, we show that synthetic data generated from the SinGAN-Seg pipeline improving the performance of segmentation algorithms when the training dataset is very small. Since our SinGAN-Seg pipeline is applicable for any medical dataset, this pipeline can be used with any other segmentation datasets.
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2107.00471 [eess.IV]
(or arXiv:2107.00471v1 [eess.IV] for this version)
Reach out to me:
Check out my other articles on Medium. : https://machine-learning-made-simple....
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn: https://www.linkedin.com/in/devansh-d...
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
A survey of modified support vector machine using particle of swarm optimizat...Editor Jacotech
The main objective of this survey paper is to provide a detailed description of Wireless Sensor Networks with Medium Access Control layer and Routing layer. In the medium access control layer, Event Driven Time Division Multiple Access protocol is studied and in Network layer, two routing protocols Bellman-Ford and Dynamic Source Routing are studied.
Literature Survey: Clustering TechniqueEditor IJCATR
Clustering is a partition of data into the groups of similar or dissimilar objects. Clustering is unsupervised learning
technique helps to find out hidden patterns of Data Objects. These hidden patterns represent a data concept. Clustering is used in many
data mining applications for data analysis by finding data patterns. There is a number of clustering techniques and algorithms are
available to cluster the data object. According to the type of data object and structure appropriate clustering technique is selected. This
survey focuses on the clustering techniques for their input attribute data type, their input parameters and output. The main objective is
not to understand the actual working of clustering technique. Instead, the input data requirement and input parameters of clustering
technique are focused.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
Comparative study of classification algorithm for text based categorizationeSAT Journals
Abstract
Text categorization is a process in data mining which assigns predefined categories to free-text documents using machine
learning techniques. Any document in the form of text, image, music, etc. can be classified using some categorization techniques.
It provides conceptual views of the collected documents and has important applications in the real world. Text based
categorization is made use of for document classification with pattern recognition and machine learning. Advantages of a number
of classification algorithms have been studied in this paper to classify documents. An example of these algorithms is: Naive Bayes'
algorithm, K-Nearest Neighbor, Decision Tree etc. This paper presents a comparative study of advantages and disadvantages of
the above mentioned classification algorithm.
Keywords: Data Mining, Text Mining, Text Categorization, Machine Learning, Pattern Analysis, Naive Bayes’, KNN,
Decision Tree.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...IJCI JOURNAL
When only a few lower modes data are available to evaluate a large number of unknown parameters, it is
difficult to acquire information about all unknown parameters. The challenge in this kind of updation
problem is first to get confidence about the parameters that are evaluated correctly using the available
data and second to get information about the remaining parameters. In this work, the first issue is resolved
employing the sensitivity of the modal data used for updation. Once it is fixed that which parameters are
evaluated satisfactorily using the available modal data the remaining parameters are evaluated employing
modal data of a virtual structure. This virtual structure is created by adding or removing some known
stiffness to or from some of the stories of the original structure. A 12-story shear building is considered for
the numerical illustration of the approach. Results of the study show that the present approach is an
effective tool in system identification problem when only a few data is available for updation.
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
Data mining, the extraction of hidden information from large databases, is to predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. Data-Mining classification techniques deals with determining to which group each data instances are associated with. It can deal with a wide variety of data so that large amount of data can be involved in processing. This paper deals with analysis on various data mining classification techniques such as Decision Tree Induction, Naïve Bayes , k-Nearest Neighbour (KNN) classifiers in mammographic mass dataset.
Smooth Support Vector Machine for Suicide-Related Behaviours Prediction IJECEIAES
Suicide-related behaviours need to be prevented on psychiatric patients. Prediction of those behaviours based on patient medical records would be very useful for the prevention by the psychiatric hospital. This research focused on developing this prediction at the only one psychiatric hospital of Bali Province by using Smooth Support Vector Machine method, as the further development of Support Vector Machine. The method used 30.660 patient medical records from the last five years. Data cleaning gave 2665 relevant data for this research, includes 111 patients that have suicide-related behaviours and under active treatment. Those cleaned data then were transformed into ten predictor variables and a response variable. Splitting training and testing data on those transformed data were done for building and accuracy evaluation of the method model. Based on the experiment, the best average accuracy at 63% can be obtained by using 30% of relevant data as data testing and by using training data which has one-to-one ratio in number between patients that have suicide-related behaviours and patients that have no such behaviours. In the future work, accuracy improvement need to be confirmed by using Reduced Support Vector Machine method, as the further development of Smooth Support Vector Machine.
Smooth Support Vector Machine for Suicide-Related Behaviours Prediction
Similar to An Influence of Measurement Scale of Predictor Variable on Logistic Regression Modeling and Learning Vector Quntization Modeling for Object Classification
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
Clustering heterogeneous categorical data using enhanced mini batch K-means ...IJECEIAES
Clustering methods in data mining aim to group a set of patterns based on their similarity. In a data survey, heterogeneous information is established with various types of data scales like nominal, ordinal, binary, and Likert scales. A lack of treatment of heterogeneous data and information leads to loss of information and scanty decision-making. Although many similarity measures have been established, solutions for heterogeneous data in clustering are still lacking. The recent entropy distance measure seems to provide good results for the heterogeneous categorical data. However, it requires many experiments and evaluations. This article presents a proposed framework for heterogeneous categorical data solution using a mini batch k-means with entropy measure (MBKEM) which is to investigate the effectiveness of similarity measure in clustering method using heterogeneous categorical data. Secondary data from a public survey was used. The findings demonstrate the proposed framework has improved the clustering’s quality. MBKEM outperformed other clustering algorithms with the accuracy at 0.88, v-measure (VM) at 0.82, adjusted rand index (ARI) at 0.87, and Fowlkes-Mallow’s index (FMI) at 0.94. It is observed that the average minimum elapsed time-varying for cluster generation, k at 0.26 s. In the future, the proposed solution would be beneficial for improving the quality of clustering for heterogeneous categorical data problems in many domains.
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
Data mining is indispensable for business organizations for extracting useful information from the huge
volume of stored data which can be used in managerial decision making to survive in the competition. Due
to the day-to-day advancements in information and communication technology, these data collected from e-
commerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high
dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets
selected in subsequent iterations by feature selection should be same or similar even in case of small
perturbations of the dataset and is called as selection stability. It is recently becomes important topic of
research community. The selection stability has been measured by various measures. This paper analyses
the selection of the suitable search method and stability measure for the feature selection algorithms and
also the influence of the characteristics of the dataset as the choice of the best approach is highly problem
dependent.
Classification of data is a data mining technique based on machine learning is used to classification of each item set in as a set of dataset into a set of predefined labelled as classes or groups. Classification is tasks for different application such as text classification, image classification, class’s predictions, data Classification etc. In this paper, we presenting the major classification techniques used for prediction of classes using supervised learning dataset. Several major types of classification method including Random Forest, Naive Bayes, Support Vector Machine (SVM) techniques. The goal of this review paper is to provide a review, accuracy and comparative between different classification techniques in data mining.
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
Decision-making typically needs the mechanisms to compromise among opposing norms. Once multiple objectives square measure is concerned of machine learning, a vital step is to check the weights of individual objectives to the system-level performance. Determinant, the weights of multi-objectives is associate in analysis method, associated it's been typically treated as a drawback. However, our preliminary investigation has shown that existing methodologies in managing the weights of multi-objectives have some obvious limitations like the determination of weights is treated as one drawback, a result supporting such associate improvement is limited, if associated it will even be unreliable, once knowledge concerning multiple objectives is incomplete like an integrity caused by poor data. The constraints of weights are also mentioned. Variable weights square measure is natural in decision-making processes. Here, we'd like to develop a scientific methodology in determinant variable weights of multi-objectives. The roles of weights in a creative multi-objective decision-making or machine-learning of square measure analyzed, and therefore the weights square measure determined with the help of a standard neural network.
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
Estimating project development effort using clustered regression approachcsandit
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a
challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the
complex and dynamic interaction of factors that impact software development. Heterogeneity
exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying
them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency
due to heterogeneity of the data. Using a clustered approach creates the subsets of data having
a degree of homogeneity that enhances prediction accuracy. It was also observed in this study
that ridge regression performs better than other regression techniques used in the analysis.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
DIFFERENCE OF PROBABILITY AND INFORMATION ENTROPY FOR SKILLS CLASSIFICATION A...ijaia
The probability of an event is in the range of [0, 1]. In a sample space S, the value of probability determines whether an outcome is true or false. The probability of an event Pr(A) that will never occur = 0. The probability of the event Pr(B) that will certainly occur = 1. This makes both events A and B thus a certainty. Furthermore, the sum of probabilities Pr(E1) + Pr(E2) + … + Pr(En) of a finite set of events in a given sample space S = 1. Conversely, the difference of the sum of two probabilities that will certainly occur is 0. Firstly, this paper discusses Bayes’ theorem, then complement of probability and the difference of probability for occurrences of learning-events, before applying these in the prediction of learning objects in student learning. Given the sum total of 1; to make recommendation for student learning, this paper submits that the difference of argMaxPr(S) and probability of student-performance quantifies the weight of learning objects for students. Using a dataset of skill-set, the computational procedure demonstrates: i) the probability of skill-set events that has occurred that would lead to higher level learning; ii) the probability of the events that has not occurred that requires subject-matter relearning; iii) accuracy of decision tree in the prediction of student performance into class labels; and iv) information entropy about skill-set data and its implication on student cognitive performance and recommendation of learning [1].
An overlapping conscious relief-based feature subset selection methodIJECEIAES
Feature selection is considered as a fundamental prepossessing step in various data mining and machine learning based works. The quality of features is essential to achieve good classification performance and to have better data analysis experience. Among several feature selection methods, distance-based methods are gaining popularity because of their eligibility in capturing feature interdependency and relevancy with the endpoints. However, most of the distance-based methods only rank the features and ignore the class overlapping issues. Features with class overlapping data work as an obstacle during classification. Therefore, the objective of this research work is to propose a method named overlapping conscious MultiSURF (OMsurf) to handle data overlapping and select a subset of informative features discarding the noisy ones. Experimental results over 20 benchmark dataset demonstrates the superiority of OMsurf over six existing state-of-the-art methods.
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...IJECEIAES
The variable selection is an important technique the reducing dimensionality of data frequently used in data preprocessing for performing data mining. This paper presents a new variable selection algorithm uses the heuristic variable selection (HVS) and Minimum Redundancy Maximum Relevance (MRMR). We enhance the HVS method for variab le selection by incorporating (MRMR) filter. Our algorithm is based on wrapper approach using multi-layer perceptron. We called this algorithm a HVS-MRMR Wrapper for variables selection. The relevance of a set of variables is measured by a convex combination of the relevance given by HVS criterion and the MRMR criterion. This approach selects new relevant variables; we evaluate the performance of HVS-MRMR on eight benchmark classification problems. The experimental results show that HVS-MRMR selected a less number of variables with high classification accuracy compared to MRMR and HVS and without variables selection on most datasets. HVS-MRMR can be applied to various classification problems that require high classification accuracy.
the ppt is about classification and prediction in data mining
Similar to An Influence of Measurement Scale of Predictor Variable on Logistic Regression Modeling and Learning Vector Quntization Modeling for Object Classification (20)
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSveerababupersonal22
It consists of cw radar and fmcw radar ,range measurement,if amplifier and fmcw altimeterThe CW radar operates using continuous wave transmission, while the FMCW radar employs frequency-modulated continuous wave technology. Range measurement is a crucial aspect of radar systems, providing information about the distance to a target. The IF amplifier plays a key role in signal processing, amplifying intermediate frequency signals for further analysis. The FMCW altimeter utilizes frequency-modulated continuous wave technology to accurately measure altitude above a reference point.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Steel & Timber Design according to British Standard
An Influence of Measurement Scale of Predictor Variable on Logistic Regression Modeling and Learning Vector Quntization Modeling for Object Classification
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 1, February 2018 : 333 – 343
334
most crucial thing is the information resulting from the classification used as the basis for decision-making or
policy. Although the attributes of the object are very wide in scope, they are either tangible or intangible,
observable or unobservable. However, in order to apply a method of classification, it is necessary to
characterize the objects arranged in a particular data structure. The most commonly used data structure as an
input argument of a classification method is an input-output pair.
In the statistical field, the data in the input-output pair format should be considered as a causality
where the set of observed values in the input attribute will determine the observed value of the output
attribute [3]. The attributes of an object that has a special property that has a unique or single value on the
object are known by the term variable. When an object is observed on the basis of 5 variables, it will get 5
observation values associated with the object. These types of observational values have 4 types of
measurement scales: nominal, ordinal, interval, and ratio. This type of measurement scale in statistics will
greatly influence the selection of the most suitable analytical methods. Suppose that if the output variables
(variables affected by the input variables) are nominal or ordinal, then the statistical modeling for the
classification of objects is logistic regression [4], [5]. On the other hand, the machine learning method does
not require a causality between the input-output variables and also does not concern the type of measurement
scale in the output variable [6], [7].
Hand and Henley [8] have reviewed the methods used in object classification. They concluded that
the classification methods which are easy to understand (such as regression, nearest neighbour and tree-based
methods) are much more appealing, both to users and to clients, than are methods which are essentially black
boxes (such as Artificial Neural Network). They also permit more ready explanations of the sort of reasons
why the methods have reached their decisions. Meanwhile Dreiseitl and Ohno-Machado [9] sampled 72
papers comparing both logistic regression and neural network models on medical data sets. They analyzed
these papers with respect to several criteria, such as the size of data sets, model parameter, selection scheme,
and performance measure used in reporting model results. They said that where performance was compared
statistically, there was a 5:2 ratio of cases in which it was not significantly better to use neural networks.
Performance improvement of logistic regression model on microarray data with the Bayesian
approach to gene selection and classification using the logistic regression model. The method can effectively
identify important genes consistent with the known biological findings while the accuracy of the
classification is also high [10]. In addition, the performance comparison between ANN and logistic
regression in various fields are done by Felicisimo, et al [11] did Mapping landslide susceptibility, M.
Shafiee, et al [12] did Forecasting Stock Returns in Iran Stock Exchange, and Kamley S, et al [13] did
Forecasting of Share Market. From various studies, ANN method used is back propagation or support vector
machine. Both methods are widely used because it has a topology and learning methods that are easy to
understand. On the other hand, there is also ANN method known as Learning Vector Quantization (LVQ)
which is still rarely found applied, because this method has a competitive layer that works using the principle
of the self-organizing map [14], [15] so it has a structure and learning method that is difficult to understood.
LVQ is a classification method in which each unit of output represents a class that can be used for grouping
where the number of target groups or classes is pre-determined.
Based on the above exposure, this paper will examine the implementation of logistic regression and
LVQ network for object classification in three datasets with input variables (predictors) with different
measurement scales, respectively are intervals, ratios and nominal for data 1, data 2, and data 3. In the
logistic regression modeling is done parameter estimation stage, testing the model parameters, then test the
goodness of fit, finally obtained a suitable model. In LVQ network modeling the 4 codebooks are tested ie 2,
10, 30, and 50. The best models produced by both logistic regression and LVQ networks will be evaluated
for classification using Hit Ratio [16], ie the proportion of sample observations that can be classified by the
classification model appropriately. Implementation of both methods using software R.
2. LITERATURE REVIEW
2.1. Binary Logistic Regression Analysis
Binary logistic regression is a logistic regression with response variables that are categorical values
of binary or dichotomous. The variable response of Bernoulli's Y distributes with the following probability
functions [3]:
( ) ( ) ( ( )) (1)
The logistic regression model :
3. Int J Elec & Comp Eng ISSN: 2088-8708
An Influence of Measurement Scale of Predictor Variable on Logistic Regression… (Waego Hadi Nugroho)
335
(2)
Where: n=Number of observations
p=Number of predictor variables
=Intercept
=Logistic regression coefficient from the j th jth predictor variable
=The value of the j-th predictor variable on the i-th observation.
While the logit form is:
( ) ( ∑ ) (3)
In logistic regression, the conditional distribution pattern of the response variable is
Y=π (xi) + ε, which has 2 types of error:
Y=1 then ε=1-π (x) with probability π (x)
Y=0 then ε=-π (x) with a chance of 1-π (x)
So the error distribution has the mean equal to zero, variance {π (x) (1-π (x))} and follows the
Binomial distribution.
2.1.1. Estimation of Model Parameters
The method to estimate the logistic regression model parameters is Maximum Likelihood
Estimation (MLE). The model parameter is estimated from the ( ) vector, the value
is obtained by maximizing the likelihood function (L (β)) through derivation of its parameters. The likelihood
function is a joint probability function of the variables and . The probability distribution function for
each (xi,yi), is
( ) ( ) ( ( )) (4)
where, ( )
( ∑ )
* ( ∑ )+
According to Hosmer and Lemeshow [4], if inter-observations are assumed to be independent, the
likelihood function is the multiplication of each probability distribution in Equation (4).
( ) ∏ ( ) ∏ ( ) ( ( ))
( ) [ ( )]
∑ * (
( )
( )
) ( ( ))+
∑ * ( ∑ ) ( ( ∑ )) +
∑ ( ∑ ) ∑ ( ( ∑ ))
Max likelihood is obtained by derivating ( ) to β and equating it with zero.
( )
∑ ∑ [
( ∑ )
( ∑ )
]
∑ ∑ ( )
∑ ( ( )) (5)
Since Equation (5) is non-linear, the solution of this equation becomes difficult to resolve
analytically, requiring an iterative solution such as the Newton-Raphson method [5].
( )
( ∑ )
( ∑ )
i=1, 2,
…, n
j=1, 2,
…, p
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 1, February 2018 : 333 – 343
336
2.1.2. Parameter Significance Testing
a. Simultaneous Testing
The simultaneous test is performed to examine the role of each predictor variable in the model
simultaneously. Statistical hypotheses and test statistics are as follows [3]:
Hypothesis: H0: versus H1: at least one ;
(
( )
( )
* [ ( ) ( )] (6)
where: with ; j=1,2,…,p
without ; j=1,2,…,p
The G value is compared with the statistic ( ) with degrees of freedom corresponding to the
estimated number of parameters. H0 will be accepted if p-value is greater than the probability of doing type I
error of α.
b. Partial Testing
Partial testing is used to test the effect of each parameter on the model individually or separately
with regard to other parameters. The partial test results will show whether a predictor variable is eligible to
enter the model or not. If hypothesi test yields significant, then an enter in model. Hypothesis: H0: =0
versus H1: 0.
Wald statistic test,
̂
( ̂ )
The test statistic used is the Wald test, reject H0 if the value of | | ( ) or p-value <0.05, so it
can be concluded there is influence between predictor variables with response variables [4]. In addition to
following the normal distribution, the Wald statistical test squared ( ) will follow the chi-square with
the degree of freedom one [3].
2.1.3. Goodness of Fit Test
Model fit test on logistic regression, can use test statistic called goodness of fit test. This test statistic
is used to find out how big the effectiveness of the model formed in explaining the response variable, so the
model can represent the actual condition represented by the data used in the analysis. The hypothesis and test
statistic are as follows [3]:
Hypothesis: H0: The model appropriate vs H1: The model is not appropriate.
The test statistic used is ie ∑ . H0 is rejected if ( ( )).
2.2. Artificial Neural Network with Competitive Layer
Artificial neural networks (ANN) with competitive layers have three layers: the input layer, the
hidden layer, and the output layer. In this case the competitive layer lies in the hidden layer. Neurons in
networks with competitive layers compete for active rights. One of the competitive layer network model is
LVQ. There are two learning methods in ANN namely supervised learning and unsupervised learning. In
Supervised learning, every pattern given as input for ANN, has been known output. The difference between
the ANN output and the desired output (target) is called an error. This error quantity is used to correct ANN
weight so that ANN can produce output as close as possible to known target pattern. ANN learning algorithm
using this method is Hebbian, Perceptron, Adaline, Boltzman, Hopfield, Backpropagation, and LVQ.
One of the aims of ANN modeling is for classification. According Fausett [6], the basis of
classification on ANN is to use the optimal weight of the learning process. The weights are the amounts or
values that exist on the connection between neurons that transfer data from one layer to another, which serves
to regulate the network so as to produce the desired output. In addition to having advantages that do not have
to meet the classical assumptions and varian homogenity of errors, ANN also has a weakness that takes
longer training time to perform calculations in the formation of models [13].
LVQ is a classification method in which each output unit presents a class with a specified target
class. LVQ uses a supervised competitive learning algorithm version of the Kohonen Self-Organizing Map
(SOM) algorithm. LVQ network architecture according to Kaski and Kohonen [15] can be seen in Figure 1.
5. Int J Elec & Comp Eng ISSN: 2088-8708
An Influence of Measurement Scale of Predictor Variable on Logistic Regression… (Waego Hadi Nugroho)
337
Figure 1. The architecture LVQ network
where: R=number of elements in the input vector
S1
=the number of competitive neurons
S2
=number of linear neurons
Based on Figure 1., the LVQ network consists of three layers: an input layer, a competitive layer,
and an output layer. Learning in the competitive layer aims to classify each input vector to a hidden layer.
While learning on the output layer aims at transforming the subclass on the competitive layer into the
classification of the target class that has been established. Learning the competitive layer as a subclass and
learning from the output layer as the target class. According to Putri (2012), there are two factors that
influence the learning process in LVQ namely initial initialization and training rate.
Setting data as input on LVQ network must be in input-output pair format. In this case the output
data will serve as a target in the learning process. Suppose in the format as follows:
{ ( ) ( )
} q=1, 2, ..., Q (7)
Where: ( )
=vector/input matrix
( )
=the output vector
LVQ consists of a competitive layer that includes a competitive subnet and a linear output layer. In a
competitive layer, each neuron is assigned to a class. Different neurons in the competitive layer, it is possible
to have the same class. Each class is then paired with one of the neurons in the output layer. Thus the number
of neurons in the competitive layer, at least as much as the number of neurons in the linear output layer [14].
The relationship between the input vector and one of the weight vectors is measured by the Euclid distance.
A subnet is used to find the smallest element in the input data.
( )
[
‖
( )
‖
‖
( )
‖
‖
( )
‖]
(8)
An element given value 1 indicates that the input vector belongs to the intended class, and an
element is assigned a value of 0 if the input vector is not included in the desired class. This can be
represented by a subnet as a vector with the following vector functions:
a(1)
=compet ( ( )
) (9)
Linear output layer of LVQ network, used to combine subclasses into a single class. This is done
using the weight matrix ( )
, ie the weight matrix having elements:
, (10)
In addition, the weight matrix ( )
, on the competitive layer must be trained using the Kohonen
SOM rule as follows:
At each iteration, each training vector is entered into the network as input x and the Euclid distance
from the input vector to each prototype vector (weighted matrix column) is calculated. Neuron j* wins the
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 1, February 2018 : 333 – 343
338
competition if Euclid's distance between x and j* prototype vector is the smallest. The activation value a(1)
is
multiplied by n ( )
in its right position to obtain input n (2)
. The output a(2)
=n(2)
, as long as the transfer
function in the output neuron is an identity function. The a(2)
also has only one value in the element k*,
indicating that the input vector belongs to the class k*. Kohonen rules are used to fix the weights on the
hidden layer. If x is correctly classified, the weight vector
( )
is the winner so that the hidden neuron is
moved closer to x.
( )
(
( )
) if ( ) (11)
But if x is classified incorrectly, it is obvious that the wrong hidden neurons win the competition. In
this case, the weight is moved away from the x.
( )
(
( )
) if ( ) (12)
After the training, the final weights (w) will be used for the next simulation, test or
classification [15].
2.3. Accuracy of Classification
According to Dianiati (2013), prior to classification, the data is divided into two datasets. The first
part is the training dataset used to form the optimal model of artificial neural networks, while the second part
is the testing dataset to test the optimal model obtained from the training dataset. Hair, et al.[5] explains the
principle of sharing the most popular proportion of training-testing datasets is 50-50, but most researchers
also use the 60-40 or 75-25 division principle, since there is no standard rule about dividing the dataset. The
precision in classification can be determined by calculating the value of Hit Ratio, ie the proportion of
observational samples that can be classified by the classification function [16]. The value of Hit Ratio can be
calculated using the following formula:
(13)
The Hit Ratio value calculated according to the Equation (13) shows performance of classification
function. A biger Hit Ratio value indicates a better classification method.
3. RESEARCH METHOD
3.1. Data Characteristics
The data used are secondary data obtained from three previous studies. The three datasets have
different response and predictor variables. The explanation of the data used is shown in Table 1.
Table 1. Predictor and Response Variable of the Data
Dataset
Data
sources
Predictor variables Predictor
scale
Categorical respon
variable
record
Diet on toddlers
Sartika
[17]
=Carbohydrates
=Vegetables
=Side dish
=Fruit
=Milk
Interval Bad diet 0
108
Diet is enough 1
The granting of
credit for seaweed
business
Sunadji
[18]
=Experience
=Duration of education
=Employment Intensity
=Age
=Seaweed cleanliness level
==Seaweed Water Content
Rasio Do not accept
credit
0
138
Accepting credit 1
Factors that
influence the
incidence of low
birth weight infants
(LBW)
Pandin
[19]
=Mother Age
=Parity
=Birth distance
=Anemia
=Nutrition Status
=Education
Nominal Case of LBW 1
96Not a case of
LBW
2
7. Int J Elec & Comp Eng ISSN: 2088-8708
An Influence of Measurement Scale of Predictor Variable on Logistic Regression… (Waego Hadi Nugroho)
339
3.2. Research Stages
In this study, the first step is to divide each data into 2 parts, 70% for training and 30% for testing
dataset, then continued logistic regression analysis and LVQ analysis. Binary logistic regression analysis was
performed using software R. The steps in binary logistic regression analysis were
a. Check for the presence or absence of multicollinearity between independent variables X
b. Parameter estimation
c. Testing parameters simultaneously
d. Partial parameter test
e. The establishment of a logistic regression model
f. Check the suitability of the model
g. Apply binary logistic regression model obtained from training data to testing data
h. Forming a table of precision classification of training and testing models.
While the analysis of LVQ is done using package class on software R. The steps in LVQ analysis
are:
a. Forming an input matrix on training data and forming a vector or classification factor for training data
b. Initialization weights of the LVQ network
c. Determine the learning rate (α)
d. Renew weight on the competitive layer to obtain optimum weight
e. Form an optimal architecture
f. Re-classified the testing dataset based on the best architecture of the LVQ method formed from the
training dataset.
After the completion of the LVQ analysis done then calculated indicator of classification accuracy
that is Hit Ratio and next to compare value of Hit Ratio from both methods.
4. RESULTS AND ANALYSIS
4.1. Logistic Regression Modeling
The process of data analysis begins with multicolinearity testing among the predictor variables on
each dataset. In the three datasets used in this study, there is no multicollinearity that is indicated by the VIF
value greater than 10. The modeling process in logistic regression of the three datasets can be continued by
estimating the model parameters of each dataset. Below is the parameter model estimator obtained by the
maximum likelihood method, presented in Table 2.
Table 2. The Parameter Estimation Results on All Datasets
Predictors
Values of estimates parameter
Dataset 1 Dataset 2 Dataset 3
5.760 0.234 0.619
2.184 0.002 -0.120
2.279 0.035 -0.191
4.415 -0.089 -1.382
4.955 0.370 1.092
X6 -2.976 2.332
Constant -100.753 45.289 2.153
The full model formed for dataset 1, dataset 2, and dataset 3 respectively are
( ) .
( )
( ) .
Tests on parameter estimators are simultaneously performed to determine the effect of predictor
variables contained in the logistic regression model to the response variable as a whole or together. This test
is based on the ratio test statistic (G) likelihood ratio test. Here is the hypothesis used:
versus j=1, 2, ..., p
The test results against parameter estimators simultaneously on the datasets 1, 2, and 3 as shown in
Figure 3.
8. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 1, February 2018 : 333 – 343
340
Table 3. Simultaneously Parameter Testing of All Datasets
Dataset
-2 log likelihood
Likelihood
Ratio Test
p-valueModel
intersep
Model
parameter
1 -43.801 -9.750076 68.103 <0.000
2 -61.508 -7.105 108.805 0.000
3 -46.374 -37.854 17.039 0.009
Based on the testing of parameters simultaneously in Table 2. for dataset 1 it can be seen that the p-
value of the likelihood ratio test is <0.000. The value is less than the level of significant α is 0.05, so it was
decided to reject H0 which means that carbohydrates, vegetables, side dishes, fruits, and milk together
significantly affect the diet status of children under five. The result of statistical test on dataset 2 can be seen
that p-value of likelihood ratio test is 0.000. The value is less than the significant level of α is 0.05, so it is
decided to reject H0 which means that experience, duration of education, labor outpour, age, level of seaweed
cleanliness, and seaweed contents significantly affect farming credit to farmers seaweed. While the test
results in data 3 can be seen that the p-value of the likelihood ratio test is 0.009. The value is less than α is
0.05, so it is decided to reject H0 which means that maternal age, parity, gestational distance, anemia,
nutritional status, and education together have a significant effect on the incidence of low birth weight babies.
To determine the predictor variables that significantly influence the response variables, it is
necessary to test the significance of the parameters in each predictor variables using Wald (Wj) test statistic.
The statistical hypothesis tested is H0: βj=0 versus H1: βj ≠ 0. Wald test statistic is Chi-square distributed with
one degree of freedom. Based on p-value on Wald test statistic, for the first data, it was found that X2
(vegetable) and X3 (side dish) variables did not significantly affect the classification of infant diet. In the
second data, only predictor X5 (level of seaweed cleanliness) has a significant effect on the determination of
credit for seaweed farmers. As for the third data, it is known that the X1 (maternal age), X2 (parity), X3 (birth
distance), and X5 (nutritional status) did not significantly affect the classification of low birth weight babies.
The logistic regression model that is formed based on significant predictor variables are as sown in Figure 4:
Table 4. The Logistic Regression Model of All Datasets and Goodness of Fit Test
Dataset The final model of logistic regression Chi-Square df p-value
1 ( ) 2.897 8 0.941
2 ( ) 7.305 1 0.007
3 ( ) 0.518 2 0.772
Testing the suitability (goodness) model used to determine whether the resulting model is
appropriate (feasible). The statistical hypothesis used in this test is: ̂ (observation
frequency=expected frequency) versus ̂ (observation frequency ≠ expected frequency). Based on
Table 3. p-value for the 1st and 3rd data has a value of more than 0.05 so the decision is to receive H0 and it
can be concluded that the binary logistic regression model generated is good (appropriate), the model has
been sufficient to explain the first data (classification of infant diet), and the 3rd data (classification of low
birth weight infants). For the 2nd data because p-value has a value less than 0.05 it can be concluded that the
binary logistic regression model generated in the 2nd data has not been suitable or not enough to explain the
data. Based on these results, statistically, logistic regression models of the 1st and 3rd data can be used for
object classification and should be able to produce fairly good classification accuracy.
4.2. Learning Vector Quantization (LVQ) Optimum Model
To obtain an optimal LVQ network model, LVQ network modeling process is performed on a
various number of neurons in a hidden layer called codebook. The amount of codebook used is 2, 10, 30, and
50. The size of the codebook will determine the weight matrix dimension to be calculated to obtain optimal
LVQ network. Based on the dimension of this weight matrix, then the weights are randomly initialized. The
optimal weight will be obtained through the training process by utilizing the training data input. After all the
connecting weights between nodes in different layers of the LVQ network have been obtained, the network
output can be obtained by inputting the input data into the network. If used as an input argument is training
data then obtained the output of training data. Similarly, if used as an input argument is testing data then
obtained the output of testing data. The value of Hit Ratio can be calculated from the output obtained from
the network, either output of training or testing data. The initialized weights of the four codebooks for the
first data are shown in Table 5.
9. Int J Elec & Comp Eng ISSN: 2088-8708
An Influence of Measurement Scale of Predictor Variable on Logistic Regression… (Waego Hadi Nugroho)
341
Table 5. The initial weights between input and hidden layer of the first dataset
Size
Codebook
Neuron
Hidden
( ) ( ) ( ) ( ) ( )
2
1 4.73 3.58 5.01 4.18 5.43
2 6.70 4.76 4.33 5.00 5.24
10
1 5.34 2.62 4.42 3.36 3.78
2 4.58 3.71 3.30 3.30 5.00
... ... ... ... ... ...
10 6.70 4.40 4.83 3.34 7.50
30
1 5.34 2.62 4.42 3.36 3.78
2 4.73 3.04 2.80 2.34 3.68
... ... ... ... ... ...
30 6.10 4.14 5.49 5.28 5.00
50
1 4.73 3.58 5.01 4.18 5.43
2 5.60 4.50 2.84 3.30 4.32
... ... ... ... ... ...
50 6.18 4.91 5.19 4.00 5.43
As explained in the previous session that data 1 has 5 input variables considered to have an effect on
the response variable. In codebook=2, it must be initialized a 5x2 matrix whose elements are the connecting
weights between the input layer and the hidden layer, and a 2x1-sized vector whose elements are the
connecting weights of the hidden layer to the output layer, whereas in the codebook=50, The dimensions of
the weighted matrices and vectors to be initialized are 5x50 and 50x1 which are the connecting weights
between the input layer and the hidden layer, and the connecting weights of the hidden layer to the output
layer. After the training process, finally we get the optimal weight which some of the elements are presented
in Table 6 as follows:
Table 6. The Final Weights between Input and Hidden Layer of the First Dataset
Size
Codebook
Neuron
Hidden
( ) ( ) ( ) ( ) ( )
2
1 4.889 3.827 4.562 3.949 4.954
2 6.226 4.735 5.110 4.902 6.293
10
1 5.141 3.114 4.345 3.810 3.920
2 4.912 3.760 3.036 3.047 4.457
... ... ... ... ... ...
10 5.939 4.835 4.285 3.892 7.170
30
1 5.340 2.620 4.420 3.360 3.780
2 4.730 3.040 2.800 2.340 3.680
... ... ... ... ... ...
30 5.756 4.469 4.899 5.509 5.255
50
1 4.947 3.848 5.303 4.085 5.130
2 5.600 4.500 2.840 3.300 4.320
... ... ... ... ... ...
50 6.306 5.136 5.327 4.076 5.739
Table 7. Hit Ratio for All Dataset of Various Codebook Size
Data
Codebook
size
Hit Ratio
Training Data Testing Data
1
2 84.21 84.4
10 92.1 81.25
30 93.4 78.12
50 96.05 78.12
2
2 67.01 75.61
10 80.41 65.85
30 83.5 70.73
50 91.75 73.17
3
2 61.2 37.93
10 74.62 44.82
30 76.11 55.17
50 76.12 44.82
The final weight of LVQ that has been obtained from LVQ network learning process using training
data is then used as network weight. Thus the LVQ Network already has the connecting weights of each node
between the layers, so that the LVQ network is ready to be used as a model for object classification. Table 8
10. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 1, February 2018 : 333 – 343
342
is an indicator of LVQ network performance that is Hit Ratio value from three datasets and various
codebook.
Based on the value of Hit Ratio in Table 7 we get the best LVQ network model for dataset 1 and
dataset 2 is the model with codebook=2, whereas, in dataset 3, the best model is codebook=30. The models
are said to be the best models because they have the greatest Hit Ratio value in the data testing. If Table 7 is
observed further, it can be said that the increase in the number of codebooks is also followed by the increase
of Hit Ratio value in training data, but the incidence does not apply to the value of Hit ratio in data testing.
This phenomenon is known as overfitting.
4.3. The Comparison of Accuracy Classification between Logistic Regression and LVQ
The performance of the two methods of classifying objects is compared by the value of Hit Ratio
calculated on both training and testing datasets of all the best models. The results of the classification
accuracy of all datasets are presented in Table 8 below:
Table 8. The Acuracy of the Best Model from Both Logistic Regression and LVQ
Data Training Testing
Logistic Regression LVQ Logistic Regression LVQ
90.8 84.21 84.4 84.4
- 67.01 - 75.61
65.7 76.11 55.17 55.17
Based on the percentage of accuracy of classification results in Table 8. it can be seen that in the
second dataset (granting of seaweed farming), logistic regression method can not be calculated classification
accuracy because the model obtained from the data does not meet the model fit test. This is supported by
partial parameter test result only obtained one predictor variable (level of seaweed cleanliness) which have
significant effect to response variable (credit approval decision to seaweed farmer). Keep in mind that the
second dataset has predictor variables that are all scalable ratios. Different accuracy results are found in the
LVQ model that is in the second dataset still obtained the value of Hit Ratio, both in the training and testing
dataset of moderate enough size of 67% and 75% respectively.
Modeling on the first dataset is a case that ideally demonstrates that logistic regression model
performs equally well compared to LVQ model based on Hit ratio on dataset testing=84%. Having studied
more deeply to this logistic regression model, it turns out in the process of diagnostic examination of the
error obtained the result that the error of this model is able to meet all the assumptions in the regression
modeling. The assumptions are error independenly each others, error has a constant varian, and error has
normal distribution. The most difficult thing to be met with regression analysis is the normality assumption
of the distribution of error.
In the third dataset obtained the performance of both models are just as bad that is shown by the
value of Hit ratio on dataset testing=55%. It should be noted carefully that the predictor variables are all
nominal scale measurements. In the logistic regression model, only two categories of predictors (ie, anemia
and low educated mothers) had a significant effect on low birth weight (LBW) infants. This implies that
logistic regression modeling and LVQ network on predictive variables of nominal scale require more factors
that influence the response variable. Although the LVQ model on the training dataset has Hit Ratio=76%, the
model also remains unable to classify the test dataset satisfactorily.
5. CONCLUSION
The measurement scale of the predictor variable is very influential on the modeling and performance
of the classification model, both logistic regression model, and LVQ model. In the interval-scale predictor
variable, the best model of both methods produces an equally high accuracy of Hit Ratio=84%. In the
nominal-scale predictor variable, the classification accuracy of the best model in both methods is similarly
low: Hit Ratio=55%. While on Ratio-scale predictor variable, logistic regression modeling did not produce
the best model, But the resulting LVQ model has an accuracy for fairly moderate object classification, ie Hit
Ratio=75%.
11. Int J Elec & Comp Eng ISSN: 2088-8708
An Influence of Measurement Scale of Predictor Variable on Logistic Regression… (Waego Hadi Nugroho)
343
REFERENCES
[1] N. Suciati, W.A. Pratomo and D. Purwitasari, “Batik Motif Classification using Color-Texture-Based Feature
Extraction and Backpropagation Neural Network”, Proceeding IIAI 3rd
International Conference on Advanced
Applied Informatics, 2014, pp. 517–521.
[2] A.A. Kasim, R. Wardoyo and A. Harjoko, “Batik Classification with Artificial Neural Network Based on Texture-
Shape Feature of Main Ornament”, I.J. Intelligent Systems and Applications, vol. 6, pp. 55-65, 2017.
[3] Agresti, Categorical Data Analysis, John Wiley & Sons, New York, 2002.
[4] D.W. Hosmer and S. Lemeshow, Applied Logistic Regression, Second Edition, Willey-Inter science, Canada, 2000.
[5] J.F. Hair, et al., Multivariate Data Analysis, Seventh Edition, Prentice Hall International, New Jersey, 2010.
[6] I. Fausett, Fundamentals of Neural Network, Architecture, Algoritms and Aplications, Printice-Hall, London, 1994.
[7] R. Hecht-Nielsen, “Neurocomputing application”, Neural Computer, pp.445-453, Springer-Verlag, Berlin, 1998.
[8] D.J. Hand, and W.E. Henley, “Statistical Classification methods in consumer credit scoring: a Review”, J.R.
Statistics Society, v ol.160, part.3, pp.521-542, 1997.
[9] Dreiseitl and L. Ohno-Machado, “Logistic regression and artificial neural network classification models: a
methodology review”, Journal of Biomedical Informatics, vol. 35, pp. 352–359, 2003.
[10] X.B. Zhoua, et al., “Cancer classification and prediction using logistic regression with Bayesian gene selection”,
Journal of Biomedical Informatics, vol. 37, pp. 249–259, 2004.
[11] A. Felicisimo, et al., “Mapping landslide susceptibility with logistic regression, multiple adaptive regression
splines, classification and regression trees, and maximum entropy methods”, Landslides, Springer-Verlag, 2012.
[12] M. Shafiee, et al., “Forecasting Stock Returns Using Support Vector Machine and Decision Tree: A Case Study in
Iran Stock Exchange”, International Journal of Economy, Management and Social Sciences, vol/issue: 2(9), pp.
746-751, 2013.
[13] S. Kamley, S. Jaloree and R.S. Thakur, “Performance Forecasting of Share Market using Machine Learning
Techniques: A Review”, International Journal of Electrical and Computer Engineering (IJECE), Vol. 6, No. 6, pp.
3196-3204, 2016.
[14] T. Kohonen, “Self-Organizing map”, Springer Series In Information Sciences, Springer, Berlin, 1995.
[15] S. Kaski and T. Kohonen, “Exploratory data analysis by the selt-organizing map: Structures of welfare and poverty
in the world”, Neural Networks in Financial Engineering, World Scientific, Singapore, 1996.
[16] R.A. Johnson and D.W. Wichern, Applied Multivariate Statistical Analysis, Fifth Edition, Prentice Hall
International Inc., New Jersey, 2007.
[17] M. Sartika, “Relationship of Diet and Intake of Nutrition in Short Children in Working Area of Kapuas Hilir
District Health Center Kapuan Indonesia”, Thesis, Universitas Brawijaya Malang, 2012.
[18] Sunadji, “Seaweed Cultivation Development Model in Kupang District, NTT Province, Indonesia (Policy
Simulation with Household Economic Approach)”, Thesis, Universitas Brawijaya Malang, 2012.
[19] P.K. Pandin, “Factors Influencing Low Birth Weight (LBW) Incidence in Jumpandang Baru Makassar ,Indonesia”,
Thesis, Universitas Brawijaya Malang, 2015.
BIOGRAPHIES OF AUTHORS
Waego Hadi Nugroho is a Professor and a researcher at Department of Statistics, Faculty of
Mathematics and Natural Sciences, Universitas Brawijaya Malang. He is also a member of the
Indonesian Mathematical Society. He received the B.E. degree in Agriculture from Universitas
Brawijaya, Malang, Indonesia in 1976 and the Ph.D degree in Biometrics from University of
Adelaide, Adelaide, Australia, in 1985. His research interests include Statistical Modeling,
Sampling Tehnique and survey, and Design of Experiments.
Samingun Handoyo is a lecturer and a researcher at Department of Statistics, Faculty of
Mathematics and Natural Sciences, Universitas Brawijaya, Malang. He is also a member of the
Indonesian Mathematical Society. He received the B.S. degree in Mathematics from Universitas
Brawijaya, Malang, Indonesia in 1997 and the M.Cs degree in Computer Science from Universitas
Gadjah Mada, Yogyakarta, Indonesia in 2010. His research interests include statistical computing,
Neural Network, Fuzzy System, and Partial Least Square path modeling using R Software.
Yusnita Julyarni Akri is a lecturer and researcher at Department of Midwifery Educators at
Tribhuwana Tunggadewi University, Malang. She is also a member of the Indonesian Doctor
Society. She received her Bachelor Degree in Medical Sciences from Wijaya Kusuma University,
Surabaya, Indonesia in 2003, her M.D. from Wijaya Kusuma University, Surabaya, Indonesia in
2005 and Masters Degree in Health Sciences in 2013 from Sebelas Maret University, Surakarta,
Indonesia. Her research interest includes health sciences, midwifery practice, community health,
and traditional health practice.