Predictive maintenance and condition-based monitoring systems have seen significant prominence in
recent years to minimize the impact of machine downtime on production and its costs. Predictive
maintenance involves using concepts of data mining, statistics, and machine learning to build models that
are capable of performing early fault detection, diagnosing the faults and predicting the time to failure.
Fault diagnosis has been one of the core areas where the actual failure mode of the machine is identified.
In fluctuating environments such as manufacturing, clustering techniques have proved to be more reliable
compared to supervised learning methods. One of the fundamental challenges of clustering is developing a
test hypothesis and choosing an appropriate statistical test for hypothesis testing. Most statistical analyses
use some underlying assumptions of the data which most real-world data is incapable of satisfying those
assumptions. This paper is dedicated to overcoming the following challenge by developing a test hypothesis
for fault diagnosis application using clustering technique and performing PERMANOVA test for hypothesis
testing.
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...Devansh16
YouTube video: https://www.youtube.com/watch?v=Ao-19L0sLOI
SinGAN-Seg: Synthetic Training Data Generation for Medical Image Segmentation
Vajira Thambawita, Pegah Salehi, Sajad Amouei Sheshkal, Steven A. Hicks, Hugo L.Hammer, Sravanthi Parasa, Thomas de Lange, Pål Halvorsen, Michael A. Riegler
Processing medical data to find abnormalities is a time-consuming and costly task, requiring tremendous efforts from medical experts. Therefore, Ai has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. AI tools highly depend on data for training the models. However, there are several constraints to access to large amounts of medical data to train machine learning algorithms in the medical domain, e.g., due to privacy concerns and the costly, time-consuming medical data annotation process. To address this, in this paper we present a novel synthetic data generation pipeline called SinGAN-Seg to produce synthetic medical data with the corresponding annotated ground truth masks. We show that these synthetic data generation pipelines can be used as an alternative to bypass privacy concerns and as an alternative way to produce artificial segmentation datasets with corresponding ground truth masks to avoid the tedious medical data annotation process. As a proof of concept, we used an open polyp segmentation dataset. By training UNet++ using both the real polyp segmentation dataset and the corresponding synthetic dataset generated from the SinGAN-Seg pipeline, we show that the synthetic data can achieve a very close performance to the real data when the real segmentation datasets are large enough. In addition, we show that synthetic data generated from the SinGAN-Seg pipeline improving the performance of segmentation algorithms when the training dataset is very small. Since our SinGAN-Seg pipeline is applicable for any medical dataset, this pipeline can be used with any other segmentation datasets.
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2107.00471 [eess.IV]
(or arXiv:2107.00471v1 [eess.IV] for this version)
Reach out to me:
Check out my other articles on Medium. : https://machine-learning-made-simple....
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn: https://www.linkedin.com/in/devansh-d...
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
A comprehensive study on disease risk predictions in machine learning IJECEIAES
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. A Comprehensive study on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavors have been shifted.
بعض (وليس الكل) ملخصات الأبحاث الجيدة المنشورة فى بعض المجلات الجيدة وفيها تنوع من الافكار الابحاث الابتكارية التى يخدم فيها علوم الحاسبات فيها - انها تطبيقات حياتية
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...Devansh16
YouTube video: https://www.youtube.com/watch?v=Ao-19L0sLOI
SinGAN-Seg: Synthetic Training Data Generation for Medical Image Segmentation
Vajira Thambawita, Pegah Salehi, Sajad Amouei Sheshkal, Steven A. Hicks, Hugo L.Hammer, Sravanthi Parasa, Thomas de Lange, Pål Halvorsen, Michael A. Riegler
Processing medical data to find abnormalities is a time-consuming and costly task, requiring tremendous efforts from medical experts. Therefore, Ai has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. AI tools highly depend on data for training the models. However, there are several constraints to access to large amounts of medical data to train machine learning algorithms in the medical domain, e.g., due to privacy concerns and the costly, time-consuming medical data annotation process. To address this, in this paper we present a novel synthetic data generation pipeline called SinGAN-Seg to produce synthetic medical data with the corresponding annotated ground truth masks. We show that these synthetic data generation pipelines can be used as an alternative to bypass privacy concerns and as an alternative way to produce artificial segmentation datasets with corresponding ground truth masks to avoid the tedious medical data annotation process. As a proof of concept, we used an open polyp segmentation dataset. By training UNet++ using both the real polyp segmentation dataset and the corresponding synthetic dataset generated from the SinGAN-Seg pipeline, we show that the synthetic data can achieve a very close performance to the real data when the real segmentation datasets are large enough. In addition, we show that synthetic data generated from the SinGAN-Seg pipeline improving the performance of segmentation algorithms when the training dataset is very small. Since our SinGAN-Seg pipeline is applicable for any medical dataset, this pipeline can be used with any other segmentation datasets.
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2107.00471 [eess.IV]
(or arXiv:2107.00471v1 [eess.IV] for this version)
Reach out to me:
Check out my other articles on Medium. : https://machine-learning-made-simple....
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn: https://www.linkedin.com/in/devansh-d...
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
A comprehensive study on disease risk predictions in machine learning IJECEIAES
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. A Comprehensive study on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavors have been shifted.
بعض (وليس الكل) ملخصات الأبحاث الجيدة المنشورة فى بعض المجلات الجيدة وفيها تنوع من الافكار الابحاث الابتكارية التى يخدم فيها علوم الحاسبات فيها - انها تطبيقات حياتية
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
Machine learning is the scientific study of algorithms and statistical models that is used by the machines to perform a specific task depending on patterns and inference rather than explicit instructions. This research and analysis aims to observe how precisely a machine can predict that a patient suspected of breast cancer is having malignant or benign cancer.In this paper the classification of cancer type and prediction of risk levels is done by various model of machine learning and is pictorially depicted by various tools of visual analytics.
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...Waqas Tariq
Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods and then are used to classify discharge decisions.
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
Maximize Your Understanding of Operational Realities in Manufacturing with Predictive Insights using Big Data, Artificial Intelligence, and Pharma 4.0
by Toni Manzano, PhD, Co-founder and CSO, Bigfinite
PDA Annual Meeting 2020
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
The problem of variable selection from a large number of variables to predict certain important dependent variables has been of interest to both applied statisticians and other researchers in applied physiology. For this purpose, various statistical techniques have been developed. This framework embedded various statistical techniques of sampling and resampling and help in Statistical Simulation for Physiological Responses under different Environmental condition. The population generation and other statistical calculations are based on the inputs provided by the user as mean vector and covariance matrix and the data. This framework is developed in a way that it can work for the original data as well as for simulated data generated by the software. Approach: The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. This framework uses these two inputs and is able to generate simulated multivariate normal population for any number of variables. The software changes the manual operation into a computer-based system to automate the study, provide efficiency, accuracy, timelessness, and economy. Result: A complete framework that can statistically simulate any type and any number of responses or variables. If the simulated data is analyzed using statistical techniques; the results of such analysis will be the same as that using the original data. If the data is missing for some of the variables, in that case the system will also help. Conclusion: The proposed system makes it possible to carry out the physiological studies and statistical calculations even if the actual data is not present.
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONIJDKP
A lot of classification algorithms are available in the area of data mining for solving the same kind of problem with a little guidance for recommending the most appropriate algorithm to use which gives best results for the dataset at hand. As a way of optimizing the chances of recommending the most appropriate classification algorithm for a dataset, this paper focuses on the different factors considered by data miners and researchers in different studies when selecting the classification algorithms that will yield desired knowledge for the dataset at hand. The paper divided the factors affecting classification algorithms recommendation into business and technical factors. The technical factors proposed are measurable and can be exploited by recommendation software tools.
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
Feature extraction is a method of capturing visual content of an image. The feature extraction is the process to represent raw image in its reduced form to facilitate decision making such as pattern classification. We have tried to address the problem of classification MRI brain images by creating a robust and more accurate classifier which can act as an expert assistant to medical practitioners. The objective of this paper is to present a novel method of feature selection and extraction. This approach combines the Intensity, Texture, shape based features and classifies the tumor as white matter, Gray matter, CSF, abnormal and normal area. The experiment is performed on 140 tumor contained brain MR images from the Internet Brain Segmentation Repository. The proposed technique has been carried out over a larger database as compare to any previous work and is more robust and effective. PCA and Linear Discriminant Analysis (LDA) were applied on the training sets. The Support Vector Machine (SVM) classifier served as a comparison of nonlinear techniques Vs linear ones. PCA and LDA methods are used to reduce the number of features used. The feature selection using the proposed technique is more beneficial as it analyses the data according to grouping class variable and gives reduced feature set with high classification accuracy.
Research paper impact evaluation for collaborative information supply chainKenny Meesters
Emerging technologies provide opportunities for the humanitarian responders’ community to enhance the
effectiveness of their response to crisissituations. A part of this development can be contributed to a new type of
information supply chains -driven by collaboration with digital, online communities- enabling organizations to
make better informed decisions. However, how exactly and to what extend this collaboration impacts the
decision making process is unknown. To improve these new information exchanges and the corresponding
systems, an evaluation method is needed to assess the performance of these processes and systems. This paper
builds on existing evaluation methods for information systems and design principles to propose such an impact
evaluation framework. The proposed framework has been applied in a case study to demonstrate its potential to
identify areas for further improvement in the (online) collaboration between information suppliers and users.
SGS Biopharm Day 2016 - Modeling & simulation in Phase 1Ruben Faelens
This slidedeck shows how Model-Informed Drug Development can be applied in a Phase 1 first-in-human trial, informing starting dose, dose escalation as well as stop dose.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
ICU Patient Deterioration Prediction : A Data-Mining Approachcsandit
A huge amount of medical data is generated every da
y, which presents a challenge in analysing
these data. The obvious solution to this challenge
is to reduce the amount of data without
information loss. Dimension reduction is considered
the most popular approach for reducing
data size and also to reduce noise and redundancies
in data. In this paper, we investigate the
effect of feature selection in improving the predic
tion of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a su
bset of features would mean choosing the
most important lab tests to perform. If the number
of tests can be reduced by identifying the
most important tests, then we could also identify t
he redundant tests. By omitting the redundant
tests, observation time could be reduced and early
treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be av
oided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deteri
oration using the medical lab results. We
apply our technique on the publicly available MIMIC
-II database and show the effectiveness of
the feature selection. We also provide a detailed a
nalysis of the best features identified by our
approach.
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACHcscpconf
A huge amount of medical data is generated every day, which presents a challenge in analysing
these data. The obvious solution to this challenge is to reduce the amount of data without
information loss. Dimension reduction is considered the most popular approach for reducing
data size and also to reduce noise and redundancies in data. In this paper, we investigate the
effect of feature selection in improving the prediction of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a subset of features would mean choosing the
most important lab tests to perform. If the number of tests can be reduced by identifying the
most important tests, then we could also identify the redundant tests. By omitting the redundant
tests, observation time could be reduced and early treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be avoided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deterioration using the medical lab results. We
apply our technique on the publicly available MIMIC-II database and show the effectiveness of
the feature selection. We also provide a detailed analysis of the best features identified by our
approach.
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
Machine learning is the scientific study of algorithms and statistical models that is used by the machines to perform a specific task depending on patterns and inference rather than explicit instructions. This research and analysis aims to observe how precisely a machine can predict that a patient suspected of breast cancer is having malignant or benign cancer.In this paper the classification of cancer type and prediction of risk levels is done by various model of machine learning and is pictorially depicted by various tools of visual analytics.
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...Waqas Tariq
Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods and then are used to classify discharge decisions.
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
Maximize Your Understanding of Operational Realities in Manufacturing with Predictive Insights using Big Data, Artificial Intelligence, and Pharma 4.0
by Toni Manzano, PhD, Co-founder and CSO, Bigfinite
PDA Annual Meeting 2020
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
The problem of variable selection from a large number of variables to predict certain important dependent variables has been of interest to both applied statisticians and other researchers in applied physiology. For this purpose, various statistical techniques have been developed. This framework embedded various statistical techniques of sampling and resampling and help in Statistical Simulation for Physiological Responses under different Environmental condition. The population generation and other statistical calculations are based on the inputs provided by the user as mean vector and covariance matrix and the data. This framework is developed in a way that it can work for the original data as well as for simulated data generated by the software. Approach: The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. This framework uses these two inputs and is able to generate simulated multivariate normal population for any number of variables. The software changes the manual operation into a computer-based system to automate the study, provide efficiency, accuracy, timelessness, and economy. Result: A complete framework that can statistically simulate any type and any number of responses or variables. If the simulated data is analyzed using statistical techniques; the results of such analysis will be the same as that using the original data. If the data is missing for some of the variables, in that case the system will also help. Conclusion: The proposed system makes it possible to carry out the physiological studies and statistical calculations even if the actual data is not present.
CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTIONIJDKP
A lot of classification algorithms are available in the area of data mining for solving the same kind of problem with a little guidance for recommending the most appropriate algorithm to use which gives best results for the dataset at hand. As a way of optimizing the chances of recommending the most appropriate classification algorithm for a dataset, this paper focuses on the different factors considered by data miners and researchers in different studies when selecting the classification algorithms that will yield desired knowledge for the dataset at hand. The paper divided the factors affecting classification algorithms recommendation into business and technical factors. The technical factors proposed are measurable and can be exploited by recommendation software tools.
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
Feature extraction is a method of capturing visual content of an image. The feature extraction is the process to represent raw image in its reduced form to facilitate decision making such as pattern classification. We have tried to address the problem of classification MRI brain images by creating a robust and more accurate classifier which can act as an expert assistant to medical practitioners. The objective of this paper is to present a novel method of feature selection and extraction. This approach combines the Intensity, Texture, shape based features and classifies the tumor as white matter, Gray matter, CSF, abnormal and normal area. The experiment is performed on 140 tumor contained brain MR images from the Internet Brain Segmentation Repository. The proposed technique has been carried out over a larger database as compare to any previous work and is more robust and effective. PCA and Linear Discriminant Analysis (LDA) were applied on the training sets. The Support Vector Machine (SVM) classifier served as a comparison of nonlinear techniques Vs linear ones. PCA and LDA methods are used to reduce the number of features used. The feature selection using the proposed technique is more beneficial as it analyses the data according to grouping class variable and gives reduced feature set with high classification accuracy.
Research paper impact evaluation for collaborative information supply chainKenny Meesters
Emerging technologies provide opportunities for the humanitarian responders’ community to enhance the
effectiveness of their response to crisissituations. A part of this development can be contributed to a new type of
information supply chains -driven by collaboration with digital, online communities- enabling organizations to
make better informed decisions. However, how exactly and to what extend this collaboration impacts the
decision making process is unknown. To improve these new information exchanges and the corresponding
systems, an evaluation method is needed to assess the performance of these processes and systems. This paper
builds on existing evaluation methods for information systems and design principles to propose such an impact
evaluation framework. The proposed framework has been applied in a case study to demonstrate its potential to
identify areas for further improvement in the (online) collaboration between information suppliers and users.
SGS Biopharm Day 2016 - Modeling & simulation in Phase 1Ruben Faelens
This slidedeck shows how Model-Informed Drug Development can be applied in a Phase 1 first-in-human trial, informing starting dose, dose escalation as well as stop dose.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
ICU Patient Deterioration Prediction : A Data-Mining Approachcsandit
A huge amount of medical data is generated every da
y, which presents a challenge in analysing
these data. The obvious solution to this challenge
is to reduce the amount of data without
information loss. Dimension reduction is considered
the most popular approach for reducing
data size and also to reduce noise and redundancies
in data. In this paper, we investigate the
effect of feature selection in improving the predic
tion of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a su
bset of features would mean choosing the
most important lab tests to perform. If the number
of tests can be reduced by identifying the
most important tests, then we could also identify t
he redundant tests. By omitting the redundant
tests, observation time could be reduced and early
treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be av
oided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deteri
oration using the medical lab results. We
apply our technique on the publicly available MIMIC
-II database and show the effectiveness of
the feature selection. We also provide a detailed a
nalysis of the best features identified by our
approach.
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACHcscpconf
A huge amount of medical data is generated every day, which presents a challenge in analysing
these data. The obvious solution to this challenge is to reduce the amount of data without
information loss. Dimension reduction is considered the most popular approach for reducing
data size and also to reduce noise and redundancies in data. In this paper, we investigate the
effect of feature selection in improving the prediction of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a subset of features would mean choosing the
most important lab tests to perform. If the number of tests can be reduced by identifying the
most important tests, then we could also identify the redundant tests. By omitting the redundant
tests, observation time could be reduced and early treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be avoided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deterioration using the medical lab results. We
apply our technique on the publicly available MIMIC-II database and show the effectiveness of
the feature selection. We also provide a detailed analysis of the best features identified by our
approach.
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...IJCNCJournal
The paper proposes a two-phase classification method for detecting anomalies in network traffic, aiming to tackle the challenges of imbalance and feature selection. The study uses Information Gain to select relevant features and evaluates its performance on the CICIDS-2018 dataset with various classifiers. Results indicate that the ensemble classifier achieved the highest accuracy, precision, and recall. The proposed method addresses challenges in intrusion detection and highlights the effectiveness of ensemble classifiers in improving anomaly detection accuracy. Also, the quantity of pertinent characteristics chosen by Information Gain has a considerable impact on the F1-score and detection accuracy. Specifically, the Ensemble Learning achieved the highest accuracy of 98.36% and F1-score of 97.98% using the relevant selected features.
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...IJCNCJournal
The paper proposes a two-phase classification method for detecting anomalies in network traffic, aiming to tackle the challenges of imbalance and feature selection. The study uses Information Gain to select relevant features and evaluates its performance on the CICIDS-2018 dataset with various classifiers. Results indicate that the ensemble classifier achieved the highest accuracy, precision, and recall. The proposed method addresses challenges in intrusion detection and highlights the effectiveness of ensemble classifiers in improving anomaly detection accuracy. Also, the quantity of pertinent characteristics chosen by Information Gain has a considerable impact on the F1-score and detection accuracy. Specifically, the Ensemble Learning achieved the highest accuracy of 98.36% and F1-score of 97.98% using the relevant selected features.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second
most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast
cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast
cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data
processing tools, we tackled this disease analysis. Data mining is an important step of library discovery
where intelligent methods are used to detect patterns. Several clinical breast cancer studies were
conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier,
easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise
the testing algorithm. We used genetic programming methods to choose classification machines' best
features and parameter values. Data mining is an important step of library discovery where intelligent
methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A
comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F.
Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast
cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise
the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion ApproachIJCI JOURNAL
Exploring and Identifying Anomalies in time-series data is very crucial in today’s world revolve around data. These data are being used to make important decisions; hence, an efficient and reliable anomaly detection system should be involved in this process to ensure that the best decisions are being made. The paper explores other types of anomalies and proposes efficient detection methods which can be used. Anomalies are patterns that deviate from usual expected behavior. These can come from system failures or unexpected activity. This research paper explores the vulnerabilities of commonly used anomaly detection algorithms such as the Z-Score and static threshold approach. Each method used in this paper has its unique capabilities and limitations. These methods range from using statistical methods and machine learning approaches to detecting anomalies in a time-series dataset. Furthermore, this paper explores other open-source libraries that can be used to detect anomalies, such as Greykite and Prophet Python library. This paper serves as a good source for anyone new to anomaly detection and willing to explore.
New hybrid ensemble method for anomaly detection in data science IJECEIAES
Anomaly detection is a significant research area in data science. Anomaly detection is used to find unusual points or uncommon events in data streams. It is gaining popularity not only in the business world but also in different of other fields, such as cyber security, fraud detection for financial systems, and healthcare. Detecting anomalies could be useful to find new knowledge in the data. This study aims to build an effective model to protect the data from these anomalies. We propose a new hyper ensemble machine learning method that combines the predictions from two methodologies the outcomes of isolation forest-k-means and random forest using a voting majority. Several available datasets, including KDD Cup-99, Credit Card, Wisconsin Prognosis Breast Cancer (WPBC), Forest Cover, and Pima, were used to evaluate the proposed method. The experimental results exhibit that our proposed model gives the highest realization in terms of receiver operating characteristic performance, accuracy, precision, and recall. Our approach is more efficient in detecting anomalies than other approaches. The highest accuracy rate achieved is 99.9%, compared to accuracy without a voting method, which achieves 97%.
A Software Measurement Using Artificial Neural Network and Support Vector Mac...ijseajournal
Today, Software measurement are based on various techniques such that neural network, Genetic
algorithm, Fuzzy Logic etc. This study involves the efficiency of applying support vector machine using
Gaussian Radial Basis kernel function to software measurement problem to increase the performance and
accuracy. Support vector machines (SVM) are innovative approach to constructing learning machines that
Minimize generalization error. There is a close relationship between SVMs and the Radial Basis Function
(RBF) classifiers. Both have found numerous applications such as in optical character recognition, object
detection, face verification, text categorization, and so on. The result demonstrated that the accuracy and
generalization performance of SVM Gaussian Radial Basis kernel function is better than RBFN. We also
examine and summarize the several superior points of the SVM compared with RBFN.
Similar to FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS TESTING? (20)
FREE PUBLICATION - International Journal of Advances Robotics, Expert Systems...JaresJournal
Advance Robotics & Expert Systems carry original articles, review articles, case studies and short communications from all over the world. The main aim of this journal is to extend the state of the art on theoretical, computational and experimental aspects of expert systems related to the applied fields such as transportation, surveillance, medical and industrial domains. This journal is also concentrated on kinematics, dynamics and syntheses of various robot locomotion mechanisms such as walk, jump, run, slide, skate, swim, fly, roll etc.
International Journal of AdvancesRobotics, Expert Systems (JARES)JaresJournal
Advance Robotics & Expert Systems carry original articles, review articles, case studies and short communications from all over the world. The main aim of this journal is to extend the state of the art on theoretical, computational and experimental aspects of expert systems related to the applied fields such as transportation, surveillance, medical and industrial domains. This journal is also concentrated on kinematics, dynamics and syntheses of various robot locomotion mechanisms such as walk, jump, run, slide, skate, swim, fly, roll etc.
International Journal of Advances Robotics, Expert Systems (JARES)JaresJournal
Advance Robotics & Expert Systems carry original articles, review articles, case studies and short communications from all over the world. The main aim of this journal is to extend the state of the art on theoretical, computational and experimental aspects of expert systems related to the applied fields such as transportation, surveillance, medical and industrial domains. This journal is also concentrated on kinematics, dynamics and syntheses of various robot locomotion mechanisms such as walk, jump, run, slide, skate, swim, fly, roll etc.
BIO-INSPIRATIONS AND PHYSICAL CONFIGURATIONS OF SWARM-BOTJaresJournal
Scientists and engineers in the whole time history have turned to nature for encouragement and dreams for
trouble solving for the real time environments. By observing the performance of groups of bees and ants
are working together has given to a rise to the ‘swarm intelligence concepts’. One of the most interesting
and new explore area of recent decades towards the impressive challenge of robotics is the design of
swarm robots that are self-independent and self intelligence one. This concept can be essential for robots
exposed to environment that are shapeless or not easily available for an individual operator, such as a
distorted construction, the deep sea, or the surface of another planet. In this paper, we present a study on
the basic bio-inspirations of swarm and its physical configurations, such as reconfigurability, replication
and self-assembly. By introducing the swarm concepts through swarm-bot, which offers mainly
miniaturization with robustness, flexibility and scalability. This paper discusses about the various swarmbot intelligence, self-assembly and self-reconfigurability among the most important and capabilities as well
as functionality to swarm robots.
A COMPREHENSIVE SURVEY OF GREY WOLF OPTIMIZER ALGORITHM AND ITS APPLICATIONJaresJournal
This study presents a comprehensive and through summary of the Grey Wolf Optimizer (GWO).The GWO algorithm is a newly-presented meta-heuristic, propelled from the social hunting behavior of grey wolves. The GWO has become a progressively critical device of Swarm Intelligence that has been used in nearly all zones of optimization, and engineering practice. Numerous issues from different regions have been effectively illuminated utilizing the GWO algorithm and its variants. In arrange to utilize the calculation to illuminate assorted issues, the original GWO algorithm required to be modified or hybridized. This study conducts an exhaustive review of this living and advancing area of Swarm Intelligence, so that to show that the GWO algorithm might be connected to each issue emerging in hone. However, it empowers novice researchers and algorithm developers to utilize this straightforward and however exceptionally effective algorithm for issue tackling. It frequently ensures that the gained results about will meet the expectation.
BIO-INSPIRATIONS AND PHYSICAL CONFIGURATIONS OF SWARM-BOTJaresJournal
Scientists and engineers in the whole time history have turned to nature for encouragement and dreams for
trouble solving for the real time environments. By observing the performance of groups of bees and ants
are working together has given to a rise to the ‘swarm intelligence concepts’. One of the most interesting
and new explore area of recent decades towards the impressive challenge of robotics is the design of
swarm robots that are self-independent and self intelligence one. This concept can be essential for robots
exposed to environment that are shapeless or not easily available for an individual operator, such as a
distorted construction, the deep sea, or the surface of another planet. In this paper, we present a study on
the basic bio-inspirations of swarm and its physical configurations, such as reconfigurability, replication
and self-assembly. By introducing the swarm concepts through swarm-bot, which offers mainly
miniaturization with robustness, flexibility and scalability. This paper discusses about the various swarmbot intelligence, self-assembly and self-reconfigurability among the most important and capabilities as well
as functionality to swarm robots
AN APPROACH TO OPTIMIZE OF MANUFACTURING OF A VOLTAGE REFERENCE BASED ON HETE...JaresJournal
In this paper we introduce an approach to increase density of field-effect transistors framework a voltage
reference. Framework the approach we consider manufacturing the inverter in heterostructure with specific configuration. Several required areas of the heterostructure should be doped by diffusion or ion implantation. After that dopant and radiation defects should by annealed framework optimized scheme. We also
consider an approach to decrease value of mismatch-induced stress in the considered heterostructure. We
introduce an analytical approach to analyze mass and heat transport in heterostructures during manufacturing of integrated circuits with account mismatch-induced stress.
A NOVEL NAVIGATION STRATEGY FOR AN UNICYCLE MOBILE ROBOT INSPIRED FROM THE DU...JaresJournal
This paper deals with the design and the implementation of a novel navigation strategy for an unicycle
mobile robot. The strategy of navigation is inspired from nature (the duck walking) in the closed loop.
Therefore, a single sensor (gyroscope) is used to determine the pose of the robot, hence the proposed name
"gyroscopic navigation". A global presentation of the novel strategy and the robot and with its different
components are given. An experimental identification for the useful parameters are then exhibited.
Moreover, the controller design strategy and the simulation results are given. Finally, real time
experiments are accomplished to valid the simulation results.
MULTIPLE CONFIGURATIONS FOR PUNCTURING ROBOT POSITIONINGJaresJournal
The paper presents the Inverse Kinematics (IK) close form derivation steps using combination of analytical
and geometric techniques for the UR robot. The innovative application of this work is used in the precise
positioning of puncture robotics system. The end effector is a puncture needle guide tube, which needs
precise positioning over the puncture insertion point. The IK closed form solutions bring out maximum 8
solutions represents 8 different robot joints configurations. These multiple solutions are helpful in the
puncture robotics system, it allow doctors to choose the most suitable configuration during the operation.
Therefore the workspace becomes more adequate for the coexistence of human and robot. Moreover IK
closed form solutions are more precise in positioning for medical puncture surgery compared to other
numerical methods. We include a performance evaluation for both of the IK obtained by the closed form
solution and by a numerical method.
A PROPOSED EXPERT SYSTEM FOR EVALUATING THE PARTNERSHIP IN BANKSJaresJournal
Expert systems are no longer just a technology, but they have entered many fields of decision-making from
these medical fields, for example, as they help in diagnosing the disease and giving treatment, and also in
the field of administration, where they give the manager a rational decision to solve a problem and other
fields, DSS is an interactive information system that provides information, models, and data processing
tools to assist decision-making. Islamic banks such as commercial banks offer products and services to
customers, but these banks face many problems and the most important ones are the problems financing
where Islamic banks seek to participate in money rather than lending and interest the participatory
financing system is one of the most important sources of financing within Islamic banks This system is
based on the agreement between the Bank and the customer to participate in a new project or project
already in place in the proportions that agree to by the bank and the client but this funding takes a long
time and many actions so the researcher has built an expert system to reduce the time it takes to award
Funding and also to reduce procedures as the expert systems have the ability to help the human element in
making decisions. This paper presents expert systems in Islamic banks in the system of co-financing in
order to save time and effort and maximize profit.
DEVELOPEMENT OF GAIT GENERATION SYSTEM FOR A LOWER LIMB PROSTHESIS USING MEAS...JaresJournal
Lower limb dynamic models are useful to investigate the biomechanics of the knee and ankle joints, but
many systems have several limitations which includes simplified forces, non physiologic kinematics and the
complicated interactions between the foot and the ground. Many approaches in control of prosthetic
devices use prediction algorithms to estimate ground reaction forces (GRF), which can degrade the
performance and the efficiency of the devices due to calculation errors. In this study, the variation of the
GRF during different gait cycles was investigated in the design of an adaptive fuzzy controller for a
dynamic model of an active ankle-knee prosthesis, the efficiency of the controller was tested for walking
gait, stair ascent and descent. Real experimental kinematics of the lower limb and GRF measured by forces
platforms were selected as the controller inputs and fuzzy reasoning was used to determine the adequate
torques to actuate the prosthetic device model. The capacity of the active prosthesis and the designed
controller to provide walking, stair ascent and descent cycles was tested by comparing the gait kinematics
to those provided by a healthy subject.
PSO APPLIED TO DESIGN OPTIMAL PD CONTROL FOR A UNICYCLE MOBILE ROBOTJaresJournal
In this work, we propose a Particle Swarm Optimization (PSO) to design Proportional Derivative
controllers (PD) for the control of Unicycle Mobile Robot. To stabilize and drive the robot precisely with
the predefined trajectory, a decentralized control structure is adopted where four PD controllers are used.
Their parameters are given simultaneously by the proposed algorithm (PSO). The performance of the
system from its desired behavior is quantified by an objective function (SE). Simulation results are
presented to show the efficiency of the method. ).
The results are very conclusive and satisfactory in terms of stability and trajectory tracking of unicycle
mobile robot
ON OPTIMIZATION OF MANUFACTURING OF ELEMENTS OF AN BINARY-ROM CIRCUIT TO INCR...JaresJournal
In this paper we introduce an approach to increase integration rate of elements of an binary-ROM circuit. Framework the approach we consider a heterostructure with special configuration. Several specific areas of the heterostructure should be doped by diffusion or ion implantation. Annealing of dopant and/or radiation defects should be optimized.
MULTIPLE CONFIGURATIONS FOR PUNCTURING ROBOT POSITIONINGJaresJournal
The paper presents the Inverse Kinematics (IK) close form derivation steps using combination of analytical and geometric techniques for the UR robot. The innovative application of this work is used in the precise positioning of puncture robotics system. The end effector is a puncture needle guide tube, which needs precise positioning over the puncture insertion point. The IK closed form solutions bring out maximum 8 solutions represents 8 different robot joints configurations. These multiple solutions are helpful in the puncture robotics system, it allow doctors to choose the most suitable configuration during the operation. Therefore the workspace becomes more adequate for the coexistence of human and robot. Moreover IK closed form solutions are more precise in positioning for medical puncture surgery compared to other numerical methods. We include a performance evaluation for both of the IK obtained by the closed form solution and by a numerical method.
ANALYSIS OF WTTE-RNN VARIANTS THAT IMPROVE PERFORMANCEJaresJournal
Businesses typically have assets such as machinery, electronics or their customers. These assets share a
common trait in that at some stage they will fail or, in the case of customers, they will churn. Knowing
when and where to focus limited resources is a key area of concern for businesses. A prediction model
called the WTTE-RNN was shown to be effective for predicting the time to event for topics such as machine
failure. The purpose of this research is to identify neural network architecture variants of the WTTE-RNN
model that have improved performance. The
Predicting Forced Population Displacement Using News ArticlesJaresJournal
The world has witnessed mass forced population displacement across the globe. Population displacement has various indications, with different social and policy consequences. Mitigation of the humanitarian crisis requires tracking and predicting the population movements to
allocate the necessary resources and inform the policymakers. The set of events that triggers population movements can be traced in the news articles. In this paper, we propose the Population
Displacement-Signal Extraction Framework (PD-SEF) to explore a large news corpus and extract
the signals of forced population displacement. PD-SEF measures and evaluates violence signals,
which is a critical factor of forced displacement from it. Following signal extraction, we propose a
displacement prediction model based on extracted violence scores. Experimental results indicate
the effectiveness of our framework in extracting high quality violence scores and building accurate
prediction models.
DEVELOPEMENT OF GAIT GENERATION SYSTEM FOR A LOWER LIMB PROSTHESIS USING MEAS...JaresJournal
Lower limb dynamic models are useful to investigate the biomechanics of the knee and ankle joints, but
many systems have several limitations which includes simplified forces, non physiologic kinematics and the
complicated interactions between the foot and the ground. Many approaches in control of prosthetic
devices use prediction algorithms to estimate ground reaction forces (GRF), which can degrade the
performance and the efficiency of the devices due to calculation errors. In this study, the variation of the
GRF during different gait cycles was investigated in the design of an adaptive fuzzy controller for a
dynamic model of an active ankle-knee prosthesis, the efficiency of the controller was tested for walking
gait, stair ascent and descent. Real experimental kinematics of the lower limb and GRF measured by forces
platforms were selected as the controller inputs and fuzzy reasoning was used to determine the adequate
torques to actuate the prosthetic device model. The capacity of the active prosthesis and the designed
controller to provide walking, stair ascent and descent cycles was tested by comparing the gait kinematics
to those provided by a healthy subject
DEVELOPEMENT OF GAIT GENERATION SYSTEM FOR A LOWER LIMB PROSTHESIS USING MEAS...JaresJournal
Lower limb dynamic models are useful to investigate the biomechanics of the knee and ankle joints, but many systems have several limitations which includes simplified forces, non physiologic kinematics and the complicated interactions between the foot and the ground. Many approaches in control of prosthetic devices use prediction algorithms to estimate ground reaction forces (GRF), which can degrade the performance and the efficiency of the devices due to calculation errors. In this study, the variation of the GRF during different gait cycles was investigated in the design of an adaptive fuzzy controller for a dynamic model of an active ankle-knee prosthesis, the efficiency of the controller was tested for walking gait, stair ascent and descent. Real experimental kinematics of the lower limb and GRF measured by forces platforms were selected as the controller inputs and fuzzy reasoning was used to determine the adequate torques to actuate the prosthetic device model. The capacity of the active prosthesis and the designed controller to provide walking, stair ascent and descent cycles was tested by comparing the gait kinematics to those provided by a healthy subject.
MULTIPLE CONFIGURATIONS FOR PUNCTURING ROBOT POSITIONINGJaresJournal
The paper presents the Inverse Kinematics (IK) close form derivation steps using combination of analytical
and geometric techniques for the UR robot. The innovative application of this work is used in the precise
positioning of puncture robotics system. The end effector is a puncture needle guide tube, which needs
precise positioning over the puncture insertion point. The IK closed form solutions bring out maximum 8
solutions represents 8 different robot joints configurations. These multiple solutions are helpful in the
puncture robotics system, it allow doctors to choose the most suitable configuration during the operation.
Therefore the workspace becomes more adequate for the coexistence of human and robot. Moreover IK
closed form solutions are more precise in positioning for medical puncture surgery compared to other
numerical methods. We include a performance evaluation for both of the IK obtained by the closed form
solution and by a numerical method.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS TESTING?
1. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
DOI:10.5121/mlaij.2019.6102 17
FAULT DIAGNOSIS USING CLUSTERING. WHAT
STATISTICAL TEST TO USE FOR HYPOTHESIS
TESTING?
Nagdev Amruthnath1
and Tarun Gupta2
1
Department of Industrial Engineering, Western Michigan University, Kalamazoo, MI,
49008, USA
2
Department of Industrial and Entrepreneurial Engineering, Western Michigan
University, Kalamazoo, MI, 49008, USA
ABSTRACT
Predictive maintenance and condition-based monitoring systems have seen significant prominence in
recent years to minimize the impact of machine downtime on production and its costs. Predictive
maintenance involves using concepts of data mining, statistics, and machine learning to build models that
are capable of performing early fault detection, diagnosing the faults and predicting the time to failure.
Fault diagnosis has been one of the core areas where the actual failure mode of the machine is identified.
In fluctuating environments such as manufacturing, clustering techniques have proved to be more reliable
compared to supervised learning methods. One of the fundamental challenges of clustering is developing a
test hypothesis and choosing an appropriate statistical test for hypothesis testing. Most statistical analyses
use some underlying assumptions of the data which most real-world data is incapable of satisfying those
assumptions. This paper is dedicated to overcoming the following challenge by developing a test hypothesis
for fault diagnosis application using clustering technique and performing PERMANOVA test for hypothesis
testing.
KEYWORDS
Clustering analysis, PERMANOVA, fault diagnosis, predictive maintenance
1. INTRODUCTION
Machine maintenance has been one of the critical components of the manufacturing industry. It is
estimated that US industry spends about $200 billion a year in machine maintenance [1]. Over the
years, various maintenance techniques have been proposed such as preventive maintenance (PM),
reliability-centered maintenance (RCM), and predictive maintenance (PdM) [2]. Every new
method has proved to be much more efficient and providing a competitive advantage in both
quality and cost of the product. Most manufacturing industries today use preventive maintenance
due to its high maintenance efficiency. In the last couple of years, predictive maintenance has
seen a significant acceptance in manufacturing due to accessibility and handling the
manufacturing process data in real time, inexpensive sensors [3]and software that is capable of
handling big data and performing real-time data analytics. Today, predictive maintenance
involves collecting machine data, performing signal processing, early fault detection, fault
diagnosis, time to failure prediction, maintenance resource optimization and scheduling by using
concepts from statistics, machine learning, and data mining [4]. Based on the following
components, various architectures and processing models have been proposed over the years, and
most of the architectures use supervised learning[5][6]. In a real-world manufacturing
environment, most machines work in fluctuating environments where, the operating temperature
2. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
18
varies based on the type of products, fluctuating mechanical vibrations, fluctuating sound, and
other industrial noises. This can contribute to adding noise to the data. In these cases, a supervised
learning model’s accuracy deteriorates as it is trained for a specific environment with specific
failure modes. To overcome this challenge, supervised learning techniques can be trained with
data which considers all the identified factors. This approach can sometimes be unrealistic to
achieve, and in some cases, it can be expensive[7]. Also, if the model is not trained with a new
type of fault, then the machine would not be capable of detecting these faults. An alternative
approach to overcome this challenge is to use unsupervised learning models. This technique can
be used for both early fault detection and fault diagnosis. In recent times, this technique is used to
detect the severity of the machine state. Some of the conventional clustering models used for
these applications are k-means [4], Gaussian mixture modeling[4][8], self-organizing map [9],
neural networks and random forest clustering [10].
Clustering analysis is a data mining technique where similar data are grouped based on the
underlying information within those groups. This technique is generally used to provide deeper
insight into the data. Clustering is extensively used in different applications such as customer
segmentation in marketing [11][12], big data analytics[13], data mining[14], fraud detection[15],
and sound analytics[16]. Some of the most commonly used cluster validation techniques are
internal cluster validation and external cluster validation [17]. In internal cluster validation, the
structure of the cluster, separation between and within clusters are determined based on different
indices to validate the significant separation between groups[18]. In external cluster validation,
the clustering results are verified with known class information to evaluate the effects[19]. Any
model developed for fault detection, and diagnosis can be substantiated and validated using the
class labels if available or by cross-referencing with the actual events such as maintenance
records (also called an external cluster validation). This is one of the most common techniques
that have been widely used. With this technique, different statistics such as accuracy, kappa value
[20], and No information rate (NIR) is calculated to compare the reliability of the model[21]. In
cases where no label information is available for fault diagnosis, then a hypothesis can be
designed along with an appropriate statistical test if, there is any significant difference between
the groups and conclude that these groups are significant. Today, some of the most commonly
used techniques are an analysis of variance (ANOVA), multivariate analysis of variance
(MANOVA), and t-test. These tests provide a statistical test on the means of the test groups and a
post hoc test to compare which pairs are significantly different. These techniques require some
assumption regarding the data such as [22]
• data follows a normal distribution
• linearity in data
• all groups have the same variance
• homogeneity of covariance
• the presence of outliers
The issue with real-world data is to satisfy these assumptions to fit the data to a model. In most
cases, there is the presence of outliers in data or a valid data point resembles as an outlier, most
data don’t always follow a normal distribution, and non-linearity is commonly observed. Over the
years different methods have been proposed to overcome these challenges such as outlier
detection test to detect and remove outliers from the data[23], and transformation techniques to
transform non-normal data to normality[24]. By removing outliers and transformation, there is a
possibility of losing significance in the data. To overcome these problems, this paper proposes a
new technique of using Permutational ANOVA proposed by Anderson for testing the statistical
significance of fault diagnosis clustering models[25]. This statistical test is widely used in various
applications such as ecology[26], biology, marine biology[27]and microbial study[28]. One of the
3. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
19
main advantages of this technique is that there can be more factors than the number of samples
and there is no effect on the presence of “zeros” in the data.
2. LITERATURE REVIEW
One of the most commonly used statistical tests for hypothesis testing is t-test when testing for
two groups’ means and when there are more than two groups, ANOVA is used. ANOVA was
first proposed by Ronald Fisher, a biologist, and a statistician. ANOVA is widely used for
comparing the means of two or more groups for statistical significance [22]. This technique is
extensively used in testing experimental data. When the data is collected, a hypothesis can be
designed where the null hypothesis is “there is no significant difference in the means between the
groups.” Hypothesis testing and its significance level limit the Type I error rate (false positives).
But, Type II error (false negative usually depends on sample size, significance level, and effect
size. In cases of multiple response variables, MANOVA is typically used.
Fault diagnosis and severity detection are one of the critical components of predictive
maintenance. With proper diagnosis technique, the process of fault investigation can be
eliminated during the maintenance schedule and hence the overall time to repair can be
minimized. With the severity detection techniques, the maintenance schedule can be optimized to
minimize the cost of maintenance. Today, diagnosis technique is performed using machine
learning[4], deep learning, statistics, and data mining techniques using labeled data. To have a
robust and reliable model that is capable of working across different environments and scalable to
different applications, unsupervised clustering techniques have shown favorable results in the
current literature [4]. Clustering is used in various applications such as customer
segmentation[12], image classification[29], data mining[14], production flow analysis [30], and
pattern recognition. One of the key challenges that exist with this technique is to properly validate
the results using the appropriate statistical test to test if, there is a significant difference between
the clusters. The other problem is, most statistical analyses that are commonly used today have a
set of assumptions and most real-world data fail to satisfy those assumptions. One of the widely
encountered assumptions is a test for normality where parametric tests assume that the data
follow a normal distribution. When these fail, then the other option is to use a non-parametric test.
3. HYPOTHESIS TESTING
In this research, a condition based monitoring system was set up for an industrial furnace cooling
fan. Accelerometers were mounted to monitor the degradation cycle of the machine on radial and
axial positions [2]. The vibration data is collected using wireless vibration sensors mounted on x-
axis and y-axis as shown in Figure 1 and Figure 2. The data sampling frequency was 2048 Hz [4].
The vibration data were collected every 10 minutes for five months. This data was processed
continuously using machine learning techniques for feature extraction and fault diagnosis.
Figure 1: Raw vibration data sample for Y-axis in terms of g
4. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
20
Figure 2: Raw vibration data sample for X-axisin terms of g
3.1. FEATURE EXTRACTION AND ANALYSIS
Feature extraction in machine learning is a process of extracting significant attributes of the data.
These characteristics can be statistical features, domain-specific features or both. In this research,
the statistical features of the vibration data were extracted which include mean, median, min,
max, kurtosis, skewness, standard deviation, RMS, and range [31]. These features were obtained
for both x-axis and y-axis data. The collected vibration data was in time series, and this
information was transformed into the frequency domain using fast Fourier transforms as shown in
Figure 3 and Figure 5 [32]. The machine’s operating frequency was identified to be 26.1Hz, and
hence, significant features in the frequency domain were collected at its operating frequency.
Figure 3: FFT result of vibration data for Y-axis
Figure 4: FFT result of vibration data for X-axis
Feature data analysis can be performed using various techniques that are available today. Some of
the most commonly used methods are histograms and density plots to identify the distribution in
5. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
21
the data, data uniqueness, and variance analysis to identify all the features collected for
significant and unique, box-plots to determine any presence of outliers. In this research, upon
performing density analysis for various features multi-model density plots were identified. This
information will later be used in our clustering assumption. By carrying out the test for unique
values, we were able to determine that all the features used in the data had unique values. By
plotting the box-plots for normalized data, we were able to identify the range of feature data and
the presence of outliers as shown in Figure 5.
Figure 5: Box-Plot for normalized feature data
3.2. CLUSTER ANALYSIS
Clustering analysis is an unsupervised learning technique that is used to group similar types of
data. The data can be clustered based on density[33], distance[34], p-value or various other
practices. Some of the most commonly used models in clustering for machine state detection are
k-means [35], hierarchical clustering, Gaussian mixture model (GMM) based clustering [4],
Bayesian mixture model-based clustering, density-based clustering and fuzzy clustering [32].
GMM clustering technique was used for clustering the data into distinct groups. GMM is a
probabilistic model that assumes all the feature data are generated from a mixture of a finite
number of Gaussian (normal) distributions with unknown parameters[8].
In clustering, there is no definite answer for identifying the number of clusters. In this case, the
number of clusters was unknown. To determine the optimal number of clusters, different
techniques such as silhouette width, AIC [36], BIC [37], and within the sum of square (WSS) [4]
methods are available. In this research, the optimal number of clusters were identified using WSS
technique as the dimensionality of the data increase, the performance of AIC and BIC techniques
reduces [38]. The obtained result was confirmed by generating the average silhouette width for k-
number of clusters and identifying the point where it provides the maximum separation. The
results of the WSS technique is as shown in Figure 6,and average silhouette width for k-clusters is
as shown in Figure 7. Here, we can observe that there is no significant difference in WSS after
cluster six unlike the previous number of clusters. Using this information, we can confirm that the
optimal number of clusters is six. Average silhouette width is calculated for k clusters as shown in
Figure 8. The maximum separation is observed at k= 3 clusters. Upon more profound observation,
we can also observe that there is no significant difference in average separation between k=2 to
k=8. This information also confirms that the optimal number of cluster identified in the data is
6. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
22
six. Upon determining the optimal number of clusters, the feature data was clustered using GMM
clustering. The results are as shown in Figure 9.
Figure 6: WSS and elbow technique for identifying the optimal number of clusters
Figure 7: Average silhouette separation for 2 to 15 clusters
From the maintenance records, it was identified that cluster 1 was the normal operating state of
the machine after fan replacement. Cluster 2 was the shaft displacement failure mode. Cluster 3
was identified to be furnace power off state. Cluster 4 was the normal state of the machine.
Cluster 5 was the repaired state. Cluster 6 was identified to be imbalance state.
7. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
23
Figure 8: Cluster widths within clusters
Figure 9: GMM cluster results in time series
In the cluster data, we can observe that there was a presence of an imbalance in the assembly
since the beginning of data collection. On August 26, an abnormal state of the machine was
identified while increasing the severity of the abnormality. On September 1, upon scheduled
maintenance, it was identified that the shaft connecting the gear assembly and the fan were
broken. Upon replacement of the shaft assembly, a new state was formed representing the
repaired state. On November 14, we can observe the severity of the imbalance increase. The
entire assembly was replaced during the scheduled maintenance. Upon replacement of the entire
assembly, the imbalance state did not occur and formed a new cluster for the new assembly. The
machine was shut down from December 24 to December 30. This was accurately recognized by
the clustering model by forming a new cluster. A brief imbalance cluster formation is noticed
before shutdown and after turning on the machine briefly. This imbalance condition was
confirmed to be a normal operation for the machine.
3.3. VISUALIZATION OF DATA IN LOW DIMENSIONAL SPACE
In this research, to study the groups within the data a process of dimensionality reduction was
required. This was achieved using principal component analysis (PCA)[39]. Using PCA the
variance within the data was maximized, and the first three components were chosen based on
8. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
24
scree plot as shown in Figure 10 to visualize the data in 3-dimensional space to detect any groups
as shown in Figure 11 and visualize the clustering results in 3-D space as shown in Figure 12. In
Figure 11, we can observe that there is a presence of groups and are identified using red circles.
Figure 10: Variances for all the principal component vectors
Upon adding the cluster results to the 3D plot, we can observe all six groups in a lower
dimensional space as shown in Figure 12. With the following visualization, we can subjectively
validate the clustering results.
3.4. DESIGNING HYPOTHESIS
One of the essential tasks in this study was to group the data into its distinct groups; first by
identifying the optimal number of clusters and then grouping using GMM technique. It was also
very important that the clustered groups represent the actual state of the machine. The clustering
results can be validated using the maintenance records and clustered groups in time series. It was
also essential to show the significant difference between groups or clusters and this can be
achieved by developing a hypothesis and performing the appropriate statistical analysis. In this
research, a hypothesis was designed to identify the significant difference between groups.
Hypothesis
H0: There is no significant difference in variation between the cluster groups
Ha: There is a significant difference in variation between the cluster groups
The hypothesis can be tested using various statistical tests such as ANOVA, chi-square test, and
MANOVA analysis. Since the data in this research was a multivariate; multivariate data analysis
was identified.
9. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
25
Figure 11: 3-D plot for the firstthree principal components at different angles
Figure 12: 3-D plot for the firstthree principal components at different angles with clustering results
4. RESULTS AND ANALYSIS
4.1. ONE-HOT ENCODING
One-hot encoding is a feature transformation technique where categorical variables are
transformed indicating the state of the machine [40]. When this transformation is performed,
univariate data is converted to multivariate response data. In classification problems, categorical
variables are usually converted to a numeric variable so, the machine can understand. This also
creates a bias during the training process where a categorical variable assigned with a higher
number might be given greater prominence compared to a variable with a lowernumber where all
variable should have equal prominence. In instances such as this case, one-hot encoding is
performed to overcome that bias.
10. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
26
4.2. HYPOTHESIS TESTING USING MANOVA
4.2.1 Sampling
In statistical hypothesis testing it is important to sample data from the population. In this study,
samples from each cluster (machine state) is extracted through randomization. The sample sizes
for each cluster is 30.
4.2.2 Test for assumptions
MANOVA is a multivariate version of ANOVA analysis. Similar to ANOVA, MANOVA has a
set of assumptions, and they are as follow
• Data follows a normal distribution
• Linearity in data
• All groups have the same variance
• Homogeneity of covariance
Along with the following assumptions, MANOVA is also very sensitive to outliers present in the
data. If any of the assumptions fail to meet the criteria during the analysis, then the test would be
an insignificant test. In this research, based on the density plots from the feature data, it was
identified that the data follows multi-model distribution and this unmistakably fails to meet the
assumption. Shapiro-Wilk normality test was used to test the normality of the feature data [23].
The null hypothesis and alternate for the test was as follows
Hypothesis:
H0: the distribution of the data is not significantly different from the normal distribution
Ha: the distribution of the data is significantly different from the normal distribution
From a normality test, we observe that the p-value was less than 0.05 indicating that the
distribution of the data is significantly different from a normal distribution and hence we reject
the null hypothesis. From the results, it was determined that the data does not follow a normal
distribution as shown in Table 1 and Figure 4.Also, in Figure 4, we can observe that there is a
presence of outliers in the data.
Table 1: Shapiro - Wilk normality test
Shapiro - Wilk normality test
W 0.96095
p-value 6.573e-05
11. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
27
Figure 13: Test for normality in the feature data
Bartlett test of homogeneity of variances was used to test for equality of variances across groups
against [41]. The null hypothesis for this test is as follows
Hypothesis:
H0: the variance of two or more samples drawn from the same population have equal variance
Ha: the variance of two or more samples drawn from the same population do not have equal
variance
Table 2: Bartlett 's test
Bartlett test
Bartlett 's K-squared 93.919
df 5
p-value < 2.2e-16
From Bartlett 's test, we observe that the p-value was less than 0.05 indicating the variance of two
or more samples drawn from the same population do not have equal and hence we reject the null
hypothesis. From the results, it was determined that thereareat least two groups that have unequal
variances.
Failing to meet the above assumptions are not unique to data obtained from real-world data.
Domains such as ecology, microbial analysis, biology and many as such, it is often hard to satisfy
the following assumptions. To overcome the above challenge, a new statistical analysis technique
called PERMANOVA was proposed by Anderson in 2001[25].
4.3. HYPOTHESIS TESTING USING PERMANOVA
PERMANOVA also called per mutational MANOVA was proposed by Anderson to overcome
the challenges of MANOVA [25]. The problems include the assumptions of MANOVA and
sample size to be greater than the number of factors. This technique performs the analysis and
partitions the sum of squares using dissimilarities. This technique also is formerly known as non-
parametric MANOVA. Here, the inputs are linear predictors and a response matrix of an arbitrary
number of columns; they are a robust alternative to both parametric MANOVA and to ordination
methods for describing how variation is attributed to different experimental treatments or
uncontrolled covariates [42].
12. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
28
Hypothesis:
H0: the centroids of the groups, as defined in the space of the chosen resemblance measure, are
equivalent for all groups
Ha: the centroids of the groups, as defined in the space of the chosen resemblance measure, are
not equivalentto all groups
The results for hypothesis using PERMANOVA test is as shown in table 3.
Table 3: PERMANOVA statistical test results
DF Sum Sq F Pr (>F)
Group 26 132.964 45.92 0.001
Residual 153 17.03
In results from table 3, the permutation test was performed sequentially. Based on the results, we
can identify that the p-value is less than the significant level of 0.05 indicating that there is a
significant difference between groups. From the above test, we can conclude that the centroids of
the groups, as defined in the space of the chosen resemblance measure, are not equivalent to all
groups and reject the null hypothesis.
To observe the multivariate dispersion among the groups, a beta diversity test is performed. Beta
diversity test is a multivariate analogof Levene's test for homogeneity of variances. Non-
Euclidean distances between objects and group centroids are handled by reducing the original
distances to principal coordinates. This procedure has latterly been used as a means of assessing
beta diversity[42]. The hypothesis is as follows
Hypothesis
H0: no difference between groups dispersion (here between machine states)
Ha: There is a significant difference between group dispersion
The results for testing the hypothesis are as shown in Table 4.
Table 4: The analysis of multivariate homogeneity of group dispersions (variances)
DF Sum Sq Mean Sq F N. Perm Pr (>F)
Group 5 100.45 20.099 19.87 999 1.20e-15
Residual 174 175.92 1.01
Table 5: Pairwise comparison of permuted and observed P-values
Permuted P-Values
1 2 3 4 5 6
ObservedP-values
1 0.91500 0.10600 1.0000e-03 5.0000e-03 0.001
2 0.91744 0.02600 1.0000e-03 1.0000e-03 0.001
3 0.10402 0.02325 7.0000e-03 1.0000e-03 0.001
4 1.5085e-04 1.4035e-07 7.2922e-03 1.0000e-03 0.001
5 3.6307e-03 5.3810e-04 3.2128e-06 7.1421e-11 0.138
6 2.8593e-04 7.0673e-05 1.2442e-06 8.9320e-10 0.1359
13. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
29
Figure 14: Box plot for clusters and distance to the centroid
Figure 15: PCoA for clusters
In the results from Table 4, based on the p-value results we can observe that there is a significant
difference between the groups and reject the null hypothesis. Since the test results are significant,
a pairwise comparison test for observed and permuted values was calculated to perform a
pairwise comparison among cluster groups. In the following results, we could see that pairs, 1-2,
1-3, and 5-6 were not significantly different but, rest of the pairs were significantly different from
each other. From the cluster analysis, we can notice that pairs 1-2 and 1-3 are normal operating
conditions of the machine and the statistical test provides sufficient rationale to indicate that there
should be no significant difference among these pairs. Pairs 5-6 are the non-normal modes of the
machine. Cluster 5 was the displaced shaft, and cluster 6 was imbalanced. In both the cases, the
imbalance was noticed, and hence, this test provides sufficient rationale for this case.
A box-plot for clusters and its distance to the centroid was used to identify that all means of
clusters are significantly different from each other as shown in Figure 14. In the box plot, we can
notice that the cluster centroids are significantly different for each cluster. A two-dimensional
principal coordinate analysis (PCoA) was used to study the multivariate dispersion as shown in
Figure 15. In PCoA plot, we can observe that cluster 1, cluster 4 and cluster 3 are well separated.
While cluster 6 is scattered over cluster 2 and cluster 5. This is a strong indication of the presence
of an imbalance in the condition in cluster 4 and cluster 5.
14. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
30
From the results of statistical test and analysis, we have sufficient evidence to conclude that there
is a significant difference in the variation between the cluster groups and reject the null
hypothesis.
5. CONCLUSION
In this paper, vibration and ambient temperature data were collected from an industrial furnace
fan for over five months. The data was analyzed using density plots where multi-model
distributions were observed. Based on this observation, Gaussian finite mixture model-based
clustering was used to cluster the data into distinct groups, and the optimal number of clusters
was identified using WSS and elbow technique. In this research, no class labels were available to
test the validity of the model. Hence, a hypothesis was developed to identify if, there was any
significant difference between the groups (clusters). Different statistical tests such as ANOVA
and MANOVA were considered for hypothesis testing. The data were reanalyzed to identify if the
data met all the test assumptions. Based on the analyzed results, it was concluded that the
identified tests failed to meet the assumptions. Hence, a new technique proposed by Anderson
called PERMANOVA along with posthoc pair wise testing was performed to statistically validate
the results. Upon performing the PERMANOVA test, it was identified that there was a significant
difference among the groups at a p-valueless than the significance level of 0.05.
A test for the analysis of multivariate homogeneity of group dispersions (variances) was also
performed. From the results, there was a significant difference between the groups was identified.
In pair wise comparison for permuted and observed p-values, all the normal operating conditions
had nosignificant difference between them. Likewise, there was no significant difference between
the imbalance states of the machine. This research also derived a strong statistical technique to
observe if, the maintenance performed on the machines had any significant effect in returning the
machine to normal condition. From the following results, we have sufficient evidence to reject the
null hypothesis and conclude that there is significant difference among the clusters. Also from the
results, we can conclude that by using PERMANOVA testing and pair wise testing we can
identify the significance of maintenance in resolving the machine related issues.
6. FUTURE SCOPE OF WORK
Using the PERMANOVA test, a statistical difference between the clusters was identified. In
machine maintenance, it is important to study what factors affect the formation of each cluster.
This research is aimed to be expanded to further to perform factor analysis for determining those
critical factors and predicting the time to failure.
REFERENCES
[1] R. K. Mobley, An Introduction to predictive maintenance, 2 ed., 2002.
[2] NASA, “Reliability Centered Maintenance Guide for Facilities and Collateral Equipment,” National
Aeronautics and Space Administration, Washington, D.C., 2000.
[3] N. Amruthnath and P. M. Prathibhavani, “Data Security in wireless Sensor Network using Multipath
Randomized Dispersive Routes.,” International Journal of Scientific & Engineering Research, vol. 5,
no. 1, 2014.
[4 N. Amruthnath and T. Gupta, “Fault Class Prediction in Unsupervised Learning using Model-Based
Clustering Approach,” in In Information and Computer Technologies (ICICT), 2018 International
Conference, 2018.
15. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
31
[5] M. C. Garcia, M. A. Sanz-Bobi and J. d. Pico., “SIMAP: Intelligent System for Predictive
Maintenance: Application to the health condition monitoring of a windturbine gearbox,” Computers
in Industry , vol. 57, no. 6, pp. 552-568, 2006.
[6] C. Groba, S. Cech, F. Rosenthal and A. Gossling, “Architecture of a predictive maintenance
framework.,” in In Computer Information Systems and Industrial Management Applications, 2007.
CISIM'07. 6th International Conference on,, 2007.
[7] V. R. de Sa, “Learning classification with unlabeled data,” In Advances in neural information
processing systems, pp. 112-119, 1994.
[8] C. Archambeau and M. Verleysen, “Fully Nonparametric Probability Density Function Estimation
with Finite Gaussian Mixture Models,” in ICAPR'2003 proceedings - 5th International Conference on
Advances in Pattern Recognition, Calcutta, 2003.
[9] T. Kohonen, “The self-organizing map.,” Neurocomputing, vol. 21, no. 1-3, pp. 1-6, 1998.
[10] A. Liaw and M. Wiener, “Classification and regression by randomForest,” R news, vol. 2, no. 3, pp.
18-22, 2002.
[11] M. Wedel and W. A. Kamakura, Market segmentation: Conceptual and methodological foundations.,
vol. 8, Springer Science & Business Media, 2012.
[12] G. Punj and D. W. Stewart, “Cluster analysis in marketing research: Review and suggestions for
application,” Journal of marketing research , pp. 134-148, 1983.
[13] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou and A. Bouras., “A
survey of clustering algorithms for big data: Taxonomy and empirical analysis,” IEEE transactions on
emerging topics in computing, vol. 2, no. 3, pp. 216-279, 2014.
[14] P. Berkhin, “A survey of clustering data mining techniques.,” In Grouping multidimensional data, pp.
25-71, 2006.
[15] R. J. Bolton and D. J. Hand., “Unsupervised profiling methods for fraud detection,” Credit Scoring
and Credit Control VII , pp. 235-255, 2001.
[16] E. Parizet, E. Guyader and V. Nosulenko, “Analysis of car door closing sound quality.,” Applied
acoustics, vol. 69, no. 1, pp. 12-22, 2008.
[17] M. Halkidi, Y. Batistakis and M. Vazirgiannis, “On clustering validation techniques,” Journal of
intelligent information systems, vol. 17, no. 2-3, pp. 107-145, 2001.
[18] Y. Liu, Z. Li, H. Xiong, X. Gao and J. Wu., “Understanding of internal clustering validation
measures.,” in In Data Mining (ICDM), 2010 IEEE 10th International Conference, 2010.
[19] E. Rendón, I. Abundez, A. Arizmendi and E. M. Quiroz., “Internal versus external cluster validation
indexes,” International Journal of computers and communications , vol. 5, no. 5, pp. 27-34, 2011.
[20] A. J. Viera and J. M. Garrett., “Understanding interobserver agreement: the kappa statistic,” Fam
Med, vol. 37, no. 5, pp. 360-363, 2005.
[21] M. Kuhn, “Building predictive models in R using the caret package,” Journal of statistical software,
vol. 28, no. 5, pp. 1-26, 2008.
[22] G. V. Glass, P. D. Peckham and J. R. Sanders, “Consequences of failure to meet assumptions
underlying the fixed effects analyses of variance and covariance.,” Review of educational research,
vol. 42, no. 3, pp. 237-288, 1972.
16. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
32
[23] S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality (complete samples),”
Biometrika, vol. 52, no. 3/4, pp. 591-611, 1965.
[24] R. M. Sakia, “The Box-Cox transformation technique: a review.,” The statistician , pp. 169-178,
1992.
[25] M. J. Anderson, “A new method for non‐parametric multivariate analysis of variance,” Austral
ecology, vol. 26, no. 1, pp. 32-46, 2001.
[26] M. J. Anderson and D. C. Walsh., “PERMANOVA, ANOSIM, and the Mantel test in the face of
heterogeneous dispersions: what null hypothesis are you testing?,” Ecological monograph, vol. 83, no.
4, pp. 557-574, 2013.
[27] S. R. Tracey, J. M. Lyle and G. Duhamel., “Application of elliptical Fourier analysis of otolith form
as a tool for stock identification,” Fisheries Research, vol. 77, no. 2, pp. 138-147, 2006.
[28] R. N. Carmody, G. K. Gerber, J. M. L. Jr, D. M. Gatti, L. Somes, K. L. Svenson and P. J. Turnbaugh,
“Diet dominates host genotype in shaping the murine gut microbiota,” Cell host & microbe , vol. 17,
no. 1, pp. 72-84, 2015.
[29] F. Moosmann, E. Nowak and F. Jurie, “Randomized clustering forests for image classification.,”
IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 30, no. 9, pp. 1632-1646,
2008.
[30] N. Amruthnath and T. Gupta, “Modified Rank Order Clustering Algorithm Approach by Including
Manufacturing Data,” in 4th IFAC International Conference on Intelligent Control and Automation
Sciences, Reims, 2016.
[31] W. Revelle, “psych: Procedures for Personality and Psychological Research,” Northwestern
University, Evanston, Illinois, USA, 2018.
[32] N. Amruthnath and T. Gupta, “A Research Study on Unsupervised Machine Learning Algorithms for
Early Fault Detection in Predictive Maintenance,” in 5th International Conference on Industrial
Engineering and Applications, Singapore, 2018.
[33] C. Fraley and A. E. Raftery, “Model-based Clustering, Discriminant Analysis and Density
Estimation,” Journal of the American Statistical Association, vol. 97, pp. 611-631, 2002.
[34] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A k-means clustering algorithm,” Journal of the
Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1, pp. 100-108, 1979.
[35] C. T. Yiakopoulos, K. C. Gryllias and I. A. Antoniadis, “Rolling element bearing fault detection in
industrial environments based on a K-means clustering approach,” Expert Systems with Applications,
vol. 38, no. 3, pp. 2888-2911, 2011.
[36] H. Akaike, “Factor analysis and AIC,” In Selected Papers of Hirotugu Akaike,, pp. 371-386., 1987.
[37] K. P. Burnham and D. R. Anderson., “Multimodel inference: understanding AIC and BIC in model
selection,” Sociological methods & research, vol. 33, no. 2, pp. 261-304, 2004.
[38] K. W. Broman and T. P. Speed, “A model selection approach for the identification of quantitative trait
loci in experimental crosses.,” Journal of the Royal Statistical Society: Series B (Statistical
Methodology), vol. 64, no. 4, pp. 641-656, 2002.
[39] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics and intelligent
laboratory systems, vol. 2, no. 1-2, pp. 37-52, 1987.
17. Machine Learning and Applications: An International Journal (MLAIJ) Vol.6, No.1, March 2019
33
[40] J. E. Beck and B. P. Woolf., “High-level student modeling with machine learning,” in In International
Conference on Intelligent Tutoring Systems, Berlin, Heidelberg, 2000.
[41] M. S. Bartlett, “Tests of significance in factor analysis,” British Journal of Mathematical and
Statistical Psychology, vol. 3, no. 2, pp. 77-85, 1950.
[42] J. Oksanen, F. G. Blanchet, M. Friendly, R. Kindt, P. Legendre, D. McGlinn, P. R. Minchin, R. B.
O'Hara, and G. L., “vegan: Community Ecology Package,” R package version 2.4-4, 2017.
AUTHORS
NagdevAmruthnath is currently a Ph.D. Student in Industrial Engineering Department,
Western Michigan University (WMU). He earned his master’s degree in Industrial
engineering from WMU and a Bachelor's degree in Information Science and Engineering
from Visvesvaraya Institute of Technology, Karnataka, India. He has four years of
experience working in manufacturing industry specializing in the implementation of lean
manufacturing and JIT technologies, one year of experience in data analytics, machine
learning and AI applications in manufacturing and undergrad teaching experience. The author has journal
and conference proceeding publications in production flow analysis, ergonomics, machine learning, and
wireless sensor networks. Currently, his research focus is on developing machine learning and AI
technologies for manufacturing application. Nagdev continues to serve as a reviewer of scholarly journals
which includes Journal of Electrical Engineering and IEEE transactions on Reliability.
Dr.Tarun Gupta is Professor in the Department of Industrial & Entrepreneurial
Engineering at Western Michigan University for 29 years. He is a Ph.D. in 1988 from
the University of Wisconsin in industrial & systems engineering with a minor in
computer science; a 1979 B.Tech in Mechanical Engineering from IITBHU Varanasi,
and a 1981 graduate of NITIE, Mumbai India with a master’s Industrial & Systems
Engineering. His prime areas of research are manufacturing automation and robotics.
He has also continued to research nano-sciences area, machine learning and data
analytics applications in advanced manufacturing systems. He has published over 100
papers in his areas of interest. He has also been the recipient of numerous consulting assignments from area
industry for specific manufacturing systems challenges. Dr. Gupta has also served as associate editor of
International Journal of Robotics and Automation and continues to serve as a reviewer of scholarly journals
which includes International Journal of Production Research, International Journal of Nanophotonics,
International Journal of Computer & Industrial Engineering, IJCI, & Journal of Operations Management.
Dr. Gupta is a member of IEEE, lifetime member of SME and lifetime member of Society of Photonics &
Instrumentation Engineers (SPIE), and a past member of IIE, ORSA, ASEE.