Public health care systems routinely collect health-related data from the population. This data can be analyzed using data mining techniques to find novel, interesting patterns, which could help formulate effective public health policies and interventions. The occurrence of chronic illness is rare in the population and the effect of this class imbalance, on the performance of various classifiers was studied. The objective of this work is to identify the best classifiers for class imbalanced health datasets through a cost-based comparison of classifier performance. The popular, open- source data mining tool WEKA, was used to build a variety of core classifiers as well as classifier ensembles, to evaluate the classifiers‟ performance. The unequal misclassification costs were represented in a cost matrix, and cost-benefit analysis was also performed. In another experiment, various sampling methods such as under-sampling, over-sampling, and SMOTE was performed to balance the class distribution in the dataset, and the costs were compared. The Bayesian classifiers performed well with a high recall, low number of false negatives and were not affected by the class imbalance. Results confirm that total cost of Bayesian classifiers can be further reduced using cost-sensitive learning methods. Classifiers built using the random under-sampled dataset showed a dramatic drop in costs and high classification accuracy.
Many molecules in nature have geometry, which enables
them to exist as non-superimposable mirror images, or enantiomers.
Modulation of toxicity of such molecules provides
possibility for therapeutics, since they target
multiple points in biochemical pathways. It was hypothesized
that toxicity of a chemical agent, could be counteracted
by a homeopathic preparation of the enantiomer of
the chemical agent
This slide explains term biostatistics, important terms used in the field of bio statistics and important applications of biostatistics in the field of agriculture, physiology, ecology, genetics, molecular biology, taxonomy, etc.
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
A clinical prediction model can be used in various clinical contexts, including screening for asymptomatic illness, forecasting future events such as disease, and assisting doctors in their decision-making and health education. Despite the positive effects of clinical prediction models on practice, prediction modeling is a difficult process that necessitates meticulous statistical analysis and sound clinical judgments. Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following always on Time, outstanding customer support, and High-quality Subject Matter Experts.
Read More With Us: https://bit.ly/3dxn32c
Why Statswork?
Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics across Methodologies | Wide Range of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities
Contact Us:
Website: www.statswork.com
Email: info@statswork.com
United Kingdom: 44-1143520021
India: 91-4448137070
WhatsApp: 91-8754446690
Many molecules in nature have geometry, which enables
them to exist as non-superimposable mirror images, or enantiomers.
Modulation of toxicity of such molecules provides
possibility for therapeutics, since they target
multiple points in biochemical pathways. It was hypothesized
that toxicity of a chemical agent, could be counteracted
by a homeopathic preparation of the enantiomer of
the chemical agent
This slide explains term biostatistics, important terms used in the field of bio statistics and important applications of biostatistics in the field of agriculture, physiology, ecology, genetics, molecular biology, taxonomy, etc.
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
A clinical prediction model can be used in various clinical contexts, including screening for asymptomatic illness, forecasting future events such as disease, and assisting doctors in their decision-making and health education. Despite the positive effects of clinical prediction models on practice, prediction modeling is a difficult process that necessitates meticulous statistical analysis and sound clinical judgments. Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following always on Time, outstanding customer support, and High-quality Subject Matter Experts.
Read More With Us: https://bit.ly/3dxn32c
Why Statswork?
Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics across Methodologies | Wide Range of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities
Contact Us:
Website: www.statswork.com
Email: info@statswork.com
United Kingdom: 44-1143520021
India: 91-4448137070
WhatsApp: 91-8754446690
Application of Pharma Economic Evaluation Tools for Analysis of Medical Condi...IJREST
Application of Pharma Economic Evaluation Tools for Analysis of Medical
Conditions: A Case Study of an Educational Institution in India
1 Dr. Debasis Patnaik, 2 Ms. Pranathi Mandadi
1Assistant Professor, Department of Economics, BITS-Pilani, K K Birla Goa Campus, Goa, India
2Research Scholar, Department of Economics, BITS-Pilani, K K Birla Goa Campus, Goa, India
ABSTRACT
The basic idea of a QALY is straightforward. The amount of time spent in a health state is weighted by the utility score given to
that health state. It takes one year of perfect health (utility score of 1) to generate one QALY, whereas one year in a health state
valued at 0.5 is regarded as being equivalent to half a QALY. Thus, an intervention that generates four additional years in a health
state valued at 0.75 will generate one more QALY than an intervention that generates four additional years in a health state valued
at 0.5. This paper discusses effect of self-medication on health care taking an educational institution population comprising of
students, teaching and non-teaching staff in 2011.
Keywords: Pharma economics, QALY, measuring clinical and health excellence
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACHNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
Performance Analysis of Data Mining Methods for Sexually Transmitted Disease ...IJECEIAES
According to health reports of Malang city, many people are exposed to sexually transmitted diseases and most sufferers are not aware of the symptoms. Malang city being known as a city of education so that every year the population number increases, it is at risk of increasing the spread of sexually transmitted diseases virus. This problem is important to be solved to treat earlier sufferers sexually transmitted diseases virus in order to reduce the burden of patient spending. In this research, authors conduct data mining methods to classifying sexually transmitted diseases. From the experiment result shows that K-NN is the best method for solve this problem with 90% accuracy.
Clinical Research Informatics (CRI) Year-in-Review 2014Peter Embi
Peter Embi's review of notable publications and events in the field of Clinical Research Informatics (CRI) that took place in 2013+. This was presented as the closing keynote presentation of the 2014 AMIA CRI Summit in San Francisco, CA on April 11, 2014.
REVIEW OFCLASS IMBALANCE DATASET HANDLING TECHNIQUES FOR DEPRESSION PREDICTIO...IJCI JOURNAL
Depression is a prevailing mental disturbance affecting an individual’s thinking and mental development. There have been much research demonstrating effective automated prediction and detection of Depression. Many datasets used suffer from class imbalance where samples of a dominant class outnumber the minority class that is to be detected. This review paper uses the PRISMA review methodology to enlist different class imbalance handling techniques used in Depression prediction and detection research. The articles were taken from information technology databases. The research gap was found that under sampling methods were few for predicting and detecting Depression and regression modelling could be considered for future research. The results also revealed that the common data level technique is SMOTE as a single method and the common ensemble method is SMOTE, oversampling and under sampling techniques. The model level consisted of various algorithms that can be used to tackle the class imbalance problem
Application of Pharma Economic Evaluation Tools for Analysis of Medical Condi...IJREST
Application of Pharma Economic Evaluation Tools for Analysis of Medical
Conditions: A Case Study of an Educational Institution in India
1 Dr. Debasis Patnaik, 2 Ms. Pranathi Mandadi
1Assistant Professor, Department of Economics, BITS-Pilani, K K Birla Goa Campus, Goa, India
2Research Scholar, Department of Economics, BITS-Pilani, K K Birla Goa Campus, Goa, India
ABSTRACT
The basic idea of a QALY is straightforward. The amount of time spent in a health state is weighted by the utility score given to
that health state. It takes one year of perfect health (utility score of 1) to generate one QALY, whereas one year in a health state
valued at 0.5 is regarded as being equivalent to half a QALY. Thus, an intervention that generates four additional years in a health
state valued at 0.75 will generate one more QALY than an intervention that generates four additional years in a health state valued
at 0.5. This paper discusses effect of self-medication on health care taking an educational institution population comprising of
students, teaching and non-teaching staff in 2011.
Keywords: Pharma economics, QALY, measuring clinical and health excellence
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACHNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
Performance Analysis of Data Mining Methods for Sexually Transmitted Disease ...IJECEIAES
According to health reports of Malang city, many people are exposed to sexually transmitted diseases and most sufferers are not aware of the symptoms. Malang city being known as a city of education so that every year the population number increases, it is at risk of increasing the spread of sexually transmitted diseases virus. This problem is important to be solved to treat earlier sufferers sexually transmitted diseases virus in order to reduce the burden of patient spending. In this research, authors conduct data mining methods to classifying sexually transmitted diseases. From the experiment result shows that K-NN is the best method for solve this problem with 90% accuracy.
Clinical Research Informatics (CRI) Year-in-Review 2014Peter Embi
Peter Embi's review of notable publications and events in the field of Clinical Research Informatics (CRI) that took place in 2013+. This was presented as the closing keynote presentation of the 2014 AMIA CRI Summit in San Francisco, CA on April 11, 2014.
REVIEW OFCLASS IMBALANCE DATASET HANDLING TECHNIQUES FOR DEPRESSION PREDICTIO...IJCI JOURNAL
Depression is a prevailing mental disturbance affecting an individual’s thinking and mental development. There have been much research demonstrating effective automated prediction and detection of Depression. Many datasets used suffer from class imbalance where samples of a dominant class outnumber the minority class that is to be detected. This review paper uses the PRISMA review methodology to enlist different class imbalance handling techniques used in Depression prediction and detection research. The articles were taken from information technology databases. The research gap was found that under sampling methods were few for predicting and detecting Depression and regression modelling could be considered for future research. The results also revealed that the common data level technique is SMOTE as a single method and the common ensemble method is SMOTE, oversampling and under sampling techniques. The model level consisted of various algorithms that can be used to tackle the class imbalance problem
CLASS IMBALANCE HANDLING TECHNIQUES USED IN DEPRESSION PREDICTION AND DETECTIONIJDKP
Depression is a prevailing mental disturbance affecting an individual’s thinking and mental development.
There has been much research demonstrating effective automated prediction and detection of Depression.
Many datasets used suffer from class imbalance where samples of a dominant class outnumber the minority
class that is to be detected. This review paper uses the PRISMA review methodology to enlist different class
imbalance handling techniques used in Depression prediction and detection research. The articles were
taken from information technology databases. The research gap was found that under sampling methods
were few for predicting and detecting Depression and regression modelling could be considered for future
research. The results also revealed that the common data level technique is SMOTE as a single method and
the common ensemble method is SMOTE, oversampling and under sampling techniques. The model level
consisted of various algorithms that can be used to tackle the class imbalance problem.
Data stratification is the process of partitioning the data into distinct and non-overlapping groups since the
study population consists of subpopulations that are of particular interest. In clinical data, once the data is
stratified into sub populations based on a significant stratifying factor, different risk factors can be
determined from each subpopulation. In this paper, the Fisher’s Exact Test is used to determine the
significant stratifying factors. The experiments are conducted on a simulated study and the Medical,
Epidemiological and Social Aspects of Aging (MESA) data constructed for prediction of urinary
incontinence. Results show that, smoking is the most significant stratifying factor of MESA data, showing
that the smokers and non-smokers indicates different risk factors towards urinary incontinence and should
be treated differently.
Reply DB5 w9 research
Reply discussion boards
1-jauregui
Discuss how the quantitative and qualitative data would complement one another and add strength to the study.
Evidently, the use of EBP in healthcare mostly relies on the available qualitative and quantitative data which is supported by scientific or clinical research. In studying the EBP, quantitative data is used to enhance qualitative information and vice versa, because one method complements the other one (Tappen, 2015, p.88). For example, in the selected article the EBP about beliefs and behaviors of nurses showed that the number of the nurses who were certified vs. nurses who were not certified explained why some of the nurses have higher perceived EBP implementation than others (Eaton, Meins, Mitchell, Voss, & Doorenbos, 2015, “Evidence-Based Practice Beliefs and Behaviors”). Quantitative data would improve the study by providing evidence in the form of numbers or amounts such as the scores which show the proficiency of nurses in different areas (Eaton, Meins, Mitchell, Voss, & Doorenbos, 2015, “Evidence-Based Practice Beliefs and Behaviors”). Quantitative data could strengthen the study by providing more detailed information about EBP implementation which will explain certain trends and occurrences as found in the research.
2- rosquete
The qualitative research is exploratory/ descriptive and emphasizes the importance of subjects frame to be referenced and the context of the study. The research will be more concerned with the truth perceived by informants and less concerned with the truth of the objectives. The information from this research will be important in understanding the informants’ behaviors in details. The description of this approach will be used to get the picture and the opinion of nursing caregivers on the use of CNS depressants by the elderly (Susan, Nancy, & Jennifer, 2013).
The method that is used is explorative/descriptive. The strengths of the descriptive method are: effective to analyze non-quantified subjects and issues, the possibility to observe the phenomenon in a natural environment, the opportunity to use qualitative and quantitative method together, and less time consuming than quantitative studies. In the case of exploratory studies, the principal advantage is the flexibility and adaptability to change and it is effective in laying the groundwork that guides to future research. We can find disadvantages in this kind of studies. For example, descriptive studies cannot test or verify the research problem statically, the majority of descriptive studies are not repeatable due to their observational nature, and they are not helpful in identifying cause behind the described phenomenon. Another weak point, that includes exploratory research, is the interpretation of information is subject to bias. These type of studies make use a modest number of samples that may not represent the target population and they are not usually helpful in decision ma.
Intensive care unit deals with data that are dynamic in nature like real time measurement of health condition to laboratory test data that are continuously
changes accordingly with time. Artificial intelligence (AI’s) potential ability to perform complex pattern analyses using large volumes of data. Generated
pattern discovers the new symptoms of the disease in the Intensive care units (ICUs), helps the doctors to prescribe the new drug discovery which is
helpful to intelligent use. Currently research work has been focused in the ICU making more efficient clinical workflow by generation of high-risk
patterns from improved high volumes of data. Emerging area of AI in the ICU includes mortality prediction, uses of powerful sensors, new drug
discovery, prediction of length of stay and legal role in uses of drugs for severity of disease. This review focuses latest application of AI drugs and
other relevant issues for the ICU.
Application of Pharma Economic Evaluation Tools for Analysis of Medical Condi...IJREST
ABSTRACT
The basic idea of a QALY is straightforward. The amount of time spent in a health state is weighted by the utility score given to that health state. It takes one year of perfect health (utility score of 1) to generate one QALY, whereas one year in a health state valued at 0.5 is regarded as being equivalent to half a QALY. Thus, an intervention that generates four additional years in a health state valued at 0.75 will generate one more QALY than an intervention that generates four additional years in a health state valued at 0.5. This paper discusses effect of self-medication on health care taking an educational institution population comprising of students, teaching and non-teaching staff in 2011.
Keywords: Pharma economics, QALY, measuring clinical and health excellence
A systematic review on paediatric medication errors by parents or caregivers ...Javier González de Dios
El objetivo del proyecto “FARMAVIZOR, uso más seguro de la medicación en pacientes pediátricos en el hogar” es desarrollar y evaluar una intervención online dirigida a padres-madres para incrementar la seguridad en el uso de los medicamentos pediátricos en el hogar. Esta intervención incluye un programa de educación sanitaria para fomentar un uso seguro del medicamento en el hogar, junto con la puesta en marcha de un sistema de notificación de incidentes en el hogar para padres-madres donde compartir experiencias con otros progenitores, aprender y mejorar a aplicar adecuadamente los tratamientos pediátricos en casa. Toda esta información se puede encontrar en la web del proyecto que hemos titulado como “Mi cuaderno pediátrico seguro seguro”.
Y como parte de este proyecto se han derivado algunos proyectos de investigación que van viendo la luz en las revistas biomédicas, en este caso el artículo “A systematic review on pediatric medication errors by parents or caregivers at home” publicado en la revista Expert Opin Drug Saf. (IF 4,250, Q2).
Abstract: Now a days detection of patients with elevated risk of diabetes mellitus is developing critical to the improved prevention and overall health management of these patients. We aim to apply association rule mining to electronic medical records (EMR) to invent sets of risk factors and their corresponding subpopulations that represent patients which have high risk of developing diabetes. With the high linearity of EMRs, association rule mining generates a very large set of rules which we need to summarize for easy medical use. We reviewed four association rule set summarization techniques and conducted a comparative evaluation to provide guidance regarding their applicability, advantages and drawbacks. We proposed extensions to incorporate risk of diabetes into the process of finding an optimum summary. We evaluated these modified techniques on a real-world border line diabetes patient associate. We found that all four methods gives summaries that described subpopulations at high risk of diabetes with every method having its clear strength. In this extension to the Bottom-Up Summarization (BUS) algorithm produced the most suitable summary. The subpopulations identified by this summary covered most high-risk patients, had low overlap and were at very high risk of diabetes.
Keywords: Agile model, Association rules, Association rule summarization, Data mining, Survival analysis, Fuzzy Clustering.
Title: Diabetes Mellitus Prediction System Using Data Mining
Author: Yamini Amrale, Arti Shedge, Sonal Singh, Anjum Shaikh
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Paper Publications
Critical Research Appraisal AssignmentNUR501 PhilosophiMargenePurnell14
Critical Research Appraisal Assignment
NUR501: Philosophical & Theoretical, Evidence-Based Research
Dr. Corzo-Sanchez
June 24, 2022
Critical Research Appraisal Assignment
Nursing research uncovers new knowledge to help build the foundation of clinical practice. Research can help prevent diseases and disabilities, help manage symptoms, establish new treatment plans and improve nursing skills. This is why nurses need to be able to participate in and analyze research, as this can bring positive outcomes to their careers and the health of their patients. There are two different types of research, quantitative and qualitative, that provide information and data. For this assignment, I chose one qualitative research that focuses on the stress and burnout experienced by nursing professionals and one quantitative analysis that explores nurses’ knowledge regarding hand hygiene. Each study will be evaluated thoroughly and analyzed.
Qualitative Research
The definition of qualitative research can be challenging. Qualitative research involves collecting and analyzing non-numerical data to understand concepts, opinions, or experiences (Morgan et al., 2021). This form of research explores deeper insights into real-world problems in an emergent and holistic way. Qualitative data can be collected using various methods such as interviews, focus groups, observations, and documentation analysis (Hoover, 2021). Qualitative research has been used in nursing for many years, but it was not the first method used in nursing. Before quantitative research, philosophical methods such as hermeneutics and phenomenology were the only options for professional inquiry (Butts & Rich, 2017). However, it was changed to qualitative research because its ways were incompatible with science. There are three major approaches to qualitative research, (1) ethnography, based on anthropology, (2) phenomenology, drawn from philosophy; and (3) grounded theory, drawn from sociology (Morgan et al., 2021). The use of qualitative studies is common due to its many strengths, such as providing multiple methods of data collection, more detailed information, and how it can refine and strengthen quantitative research. However, some of the limitations of this form of research are difficulty analyzing and collecting data while being more time-consuming.
Evaluating and Analyzing a Selected Qualitative Study
For the example of the qualitative study, I chose Luis M. Dos Santos's study, which focused on the effects of stress, burnout, and low self-efficacy in nursing professionals. The quantitative research aimed to understand and explore how social and environmental factors influence nursing professionals’ self-efficacy. In the study, the Social Cognitive Theory was used to define how each subject was affected based on their thoughts, behaviors, feeling, and personal beliefs (Dos Santos, 2020). For this research study, the phenomenological approach and analysis were used thought the survey to collec ...
Chapter 19Estimates, Benchmarking, and Other MeasuremenEstelaJeffery653
Chapter 19:
Estimates, Benchmarking, and Other Measurement Tools
Definition of EstimateAccording to the dictionary, estimate “implies a judgment, considered or casual, that precedes or takes the place of actual measuring or counting or testing out” (Merriam Webster’s Collegiate Dictionary, 10th ed., s.v. “Estimate”).
Estimates
Such estimates may be of:AmountValueSize
EstimatesThe first question should be: “Is it capable of being estimated?”If so, using an estimate often involves a trade-off, such as gaining a quick answer that is less accurate.So relying on estimates for input to reports means sacrificing some degree of accuracy.
Estimates
Four common uses of estimates are typically due to:Timeliness considerationsCost/benefit considerationsLack of dataInternal monthly statements
Estimating the Ending Pharmacy InventoryCertain healthcare organizations (or departments such as pharmacy) require accounting for inventory. Internal monthly statement usually use ending inventory estimates.The following slide (Figure 19-1) illustrates steps in the ending inventory computation.
Figure 19-1 Estimating the Ending Pharmacy Inventory.
The Scope of EstimatesEstimates can be extremely general, or they can reflect considerable judgment and well-thought-out detail.An example of a general estimate and its subsequent impact due to unrecognized indirect costs is illustrated in the next slide.
(Refer to the chapter, along with Figure 19-2, for full details about this example.)
Figure 19-2 Estimated Economic Impact of a New Specialist in a Physician Practice.
Other EstimatesOther commonly used computations are actually estimates.For example, the weighted average inventory cost described in Chapter 9 is in fact an estimate.
Performance MeasuresA variety of performance measures must be in place for the organization.Many types of such measures are available. Generally different organizations lean toward using one type over another.
Performance MeasuresAdjusted performance measures over time.We have previously discussed the advantages of comparative analysis—the comparison of various time periods, one to another. Measures that compare performance over various time periods are especially effective.Figure 19-3 illustrates measures over time combined with a two-part case mix adjustment.
Financial BenchmarkingBenchmarking is the continuous process of measuring products, services, and activities against the best levels of performance.The best levels may be found inside the organization or outside it.
Benchmarking (1 of 3)There are three types of benchmarks:A financial variable reported in an accounting systemA financial variable not reported in an accounting systemA nonfinancial variableBenchmarks are used to measure performance gaps.
Benchmarking (2 of 3)
How do you benchmark? Three possible methods include:Studying the methods and results of your prime competitors;Examining the process of noncompetitors with a world-class ...
Text Extraction Engine to Upgrade Clinical Decision Support SystemIJRTEMJOURNAL
New generation technological improvements lead patients to search their symptoms and
corresponding diagnosis on online resources. In this study, it is aimed to develop a machine learning model to
suit in different availability of users. Most of the current systems allow people to choose related symptom in web
interfaces. In addition to these applications it is aimed to implement a new technique which extracts the text-based
symptoms and its related parameters such as, severity, duration, location, cause, accompanied by any other
indicators. This study is applicable for patient`s everyday language statements besides medical expression of
symptoms for corresponding symptoms which is also supporting initial clinical decision system. Extracted terms
are analyzed for matching diagnosis where an accuracy of 90% has been accomplished.
Text Extraction Engine to Upgrade Clinical Decision Support Systemjournal ijrtem
New generation technological improvements lead patients to search their symptoms and
corresponding diagnosis on online resources. In this study, it is aimed to develop a machine learning model to
suit in different availability of users. Most of the current systems allow people to choose related symptom in web
interfaces. In addition to these applications it is aimed to implement a new technique which extracts the text-based
symptoms and its related parameters such as, severity, duration, location, cause, accompanied by any other
indicators. This study is applicable for patient`s everyday language statements besides medical expression of
symptoms for corresponding symptoms which is also supporting initial clinical decision system. Extracted terms
are analyzed for matching diagnosis where an accuracy of 90% has been accomplished.
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
2. ISSN: 2088-8708
IJECE Vol. 7, No. 4, August 2017 : 2215 – 2222
2216
formulate effective public health interventions [8]. For instance, Takeda et al. [9] utilized multiple logistic
analyses to report the significant associations between mental health and psychosocial stressors like family
relationship, pregnancy, and income.
However, these public health datasets often suffer from “rare cases” or “rare classes” problem,
which result in imbalanced classes in the training datasets [10]. For example, most heath datasets usually
have very few cases of the target disease, when compared to the number of healthy patients in the dataset
[11-14]. In the binary classification for medical diagnosis, the rare minority class refers to the positive
instances or the target class, whereas the majority class is represented by the negative instances in the dataset.
Class imbalance can also occur when the data collection process is limited, resulting in artificial imbalances.
To be classified as a class-imbalanced dataset, rarity should be between 0.1 to 10%.
This paper considers a health based dataset which records the presence of chronic diseases i.e.
diabetes, heart disease and hypertension in the population. Patient demographics, living conditions, socio-
economic status are also recorded in the dataset. The authors attempt to build classifiers to predict the
occurrence of any of the three chronic diseases in the population. Literature review indicates that there is no
single classifier method which yields the best result for all types of class imbalanced training datasets [1].
The amount of the class-imbalance bias depends on factors such as the classification method, the number of
attributes in the dataset and the sample size [15]. The motivation for this work is to study the effect of class
imbalance on the various classifiers for the health dataset. The objective of this work is to identify the best
classifiers for the class imbalanced health dataset from a cost-based comparison of classifier performance.
This work is relevant to public health policy makers, who can use the classifiers to augment medical
prognosis in the population and also identify the underlying patient features that are correlated with chronic
diseases.
1.1. Class Imbalance Problem.
This class imbalance problem is a challenge to classifier techniques because a normal classifier aims
to improve overall classifier accuracy. Consider a two-class binary classifier for a health dataset, in which the
outcomes are labeled as positive (P) or negative (N). The classifier accuracy can be computed by applying
the classifier to a test dataset and comparing the classifier result with actual class labels. There are four
possible outcomes, if the predicted value is P and the actual value is also P, then it is a true positive (TP). If
the actual value is N and it is predicted as P, then it results in a false positive (FP). Conversely, when both the
prediction outcome and the actual value are N, it indicates a true negative (TN). False negative (FN) occurs
when the prediction outcome is N while the actual value is P. The class imbalance problem affects different
classifiers in a variety of ways, for instance in decision tree induction algorithms, it results in smaller
disjuncts [11]. The classifiers trained on class-imbalanced data are usually biased towards the majority
negative class, and the accuracy of predictions for the minority target class is very poor. The class-imbalance
phenomenon often produces classifiers that have a poor predictive recall, particularly when the positive label
is the minority target class [4]. The problem of imbalanced data is also associated with asymmetric costs of
misclassifying elements of different classes. For example, consider a binary classifier built for medical
diagnosis, the cost of misdiagnosing a health patient as having a health condition (false positive) is less than
the cost of falsely diagnosing a sick patient as a healthy person (false negative). The FP case could lead to
more diagnostic tests until the patient is diagnosed as healthy. The cost of FN error could result in delayed
diagnosis, and ultimately, the loss of life. Therefore, in medical diagnosis based classifiers, the cost of FN is
more significant that the FP cost. While the FP cost can be calculated as the expenses incurred in further
testing, the cost of FN‟s hard to quantify.
1.2. Learning from class Imbalanced Datasets
There are two broad approaches to finding effective classifiers in the class-imbalanced datasets: the
algorithm specific approach and the data pre-processing approach. In the algorithm specific approach, the
classifier methods that are known to work effectively in the class-imbalanced datasets can be used with no
modification of datasets. For example, Weiss [12] advocates the use of instance-based learning methods like
k-Nearest Neighbors or Support Vector Machines, to predict the minority class effectively. They found that
independent of the training size, linearly separable domains are not sensitive to imbalance. He also concludes
that non-greedy search techniques used in the Genetic algorithm make it more suitable for dealing with the
class imbalance. In the decision tree algorithm, he suggests that splitting rules can be modified to ensure that
both classes are addressed. Japkowicz [13] concluded that the MultiLayer Perceptron based classifiers are
not sensitive to class imbalance. Kernel-based support vector machine classifiers, clustering and utilizing
densities to estimate target class membership are known to work well in the class-imbalanced datasets [16].
Simulation studies show that the class-imbalance and high dimensionality impact the performance of
classifiers like a k-nearest neighbor, diagonal linear discriminant analysis, random forests and support vector
3. IJECE ISSN: 2088-8708
Learning from a Class Imbalanced Public Health Dataset: a Cost-based Comparison of …. (Rohini R. Rao)
2217
machines with linear kernel basis [15]. Cost-sensitive learning methods modify existing algorithms using the
cost information. For instance, in a tree based classifier, the cost can be used to choose the splitting attribute
or manipulate the weight of training records [17].
There are also specific algorithms such as Two phase rule induction, CREDOS or the one-class
classifiers which are proved to be useful in classifying rare cases in training datasets [11], [16], [18-20]. The
main idea of these classification methods is that the algorithm should concentrate on the instances that are
difficult to learn. [11], [12]. Hempstalk et al. [16] use a combined density estimation with class probability
estimation for the purpose of one-class classification. The PNrule algorithm uses a two-phase rule induction
method. While the first phase focuses on recall, in the second phase precision is optimized [18]. CREDOS
effectively utilizes “ripple down rules” to learn comparable or better models for a variety of rare classes [20].
In most cases, the cost of misclassification errors are not equal, and a cost-sensitive learning approach is
required [19]. In cost sensitive learning methods such as AdaCost and MetaCost, the cost is represented in a
matrix, and it is utilized to generate a model with lower cost. Empirically, it is often reported that cost-
sensitive learning outperforms random re-sampling [11].
The literature reviewed also indicates that using multiple classifiers in ensembles and aggregating
the predictions of multiple classifiers, tend to be more accurate than the core classifiers [19]. Wang et al. [4]
discuss an implementation of an ensemble of learning algorithms to recommend medication to diabetic
patients. Ensemble methods include bagging, boosting and random forests [17]. Bagging used a majority
vote to make more accurate classifications using multiple classifiers. Boosting, on the other hand, uses an
adaptive sampling of instances, based on the weights of the instances to improve the performance of the
classifiers. Boosting methods like SMOTEBoost and AdaBoost have been found to be effective in the rare
case scenario [19]. Blending is an ensemble method where multiple algorithms are prepared on the training
data. Meta classifiers combine the predictions of multiple classifiers to make accurate predictions on unseen
data.
The data pre-processing approach to the class imbalance problem would be to modify the training
dataset itself using various sampling techniques [4], [11-13], [21]. Basic sampling methods include under-
sampling to reduce the majority class instances or over-sampling wherein the minority class instances are
increased to match the number of majority class instances. The Synthetic Minority Over-sampling Technique
(SMOTE) is widely used in the class imbalance problem. SMOTE is an over-sampling approach that creates
synthetic minority class samples to match the number of majority class instances. SMOTE is reported to
perform better than simple over-sampling. SMOTE is also computationally expensive to implement when
compared to sampling methods like random under-sampling [21]. However, other experiments have proved
that simple under-sampling tends to outperform SMOTE in most situations [22]. The performance of
classifiers implementing SMOTE has been found to vary based on the number of dimensions in the training
dataset [22]. Smart re-sampling can be deployed instead of cost-sensitive learning as they can provide new
information or eliminate redundant information for the learning algorithm [11]. The disadvantages of
sampling are, the random undersampling method can potentially remove certain critical instances, and
random over-sampling can lead to over-fitting [11], [12]. The threshold-moving approach to the class
imbalance problem does not involve any sampling. Certain classifiers like the Bayesian or decision tree
induction, return a probability value along with the class label which can be used to compute a new threshold.
In class, balanced datasets the probability threshold is 0.5. In case of class imbalance, the results of the
classifier can also be weighted based on costs. In general, threshold moving moves the threshold, so that the
rare class tuples are easier to classify. Threshold moving technique is known to reduces the costly FN errors
in classifiers used for medical diagnosis.
1.3. Evaluating Classifier Performance
Traditional classification accuracy measures such as accuracy or misclassification rate are not good
indicators of classifier accuracy in class-imbalanced datasets [11], [12]. If the target class is very rare, say
0.5%, correctly predicting all instances of the majority class can achieve a very high accuracy level of 99.5%.
The accuracy measure of precision and recall are more relevant in the case of class-imbalanced datasets [17].
Precision denotes the fraction of instances that are TPs in the set of all instances predicted as P (TP+FP).
Recall measures the fraction of TPs correctly predicted in the set of all actual P instances (TP+FN).
Classifiers with high recall have less number of FNs. Hence for rare classes, the classifier should be
evaluated based on how it performs on both recall and precision. Usually, in class-imbalanced datasets, the
target class has much lower precision and recall than the majority class. Many practitioners have observed
that for skewed class distributions the recall of the minority class is often 0, which means that no
classification rules have been generated for the target class.
Commonly used graphical display of classifier accuracy include receiver operating characteristic
curve (ROC), the precision-recall curve (PRC) and cost curves. For a binary classifier, ROC curve is a
4. ISSN: 2088-8708
IJECE Vol. 7, No. 4, August 2017 : 2215 – 2222
2218
graphical method to graphically represent the trade-off between TP rate and FP rate [11], [14]. A ROC plot
provides a single performance measure called the Area under the ROC curve (AUC) score. AUC score is 0.5
for “chance” classifiers, which indicates the lack of any statistical dependence and is equivalent to random
guessing and 1.0 for perfect classifiers. The Area under ROC Curve (AUC) can be used to compare the
performance of multiple classifiers, but they are not very useful for class-imbalanced datasets. Precision-
Recall curves (PRC) are often used instead of ROC plots to represent accuracy in the class-imbalanced
datasets [23], [24]. The PRC plot shows precision values for corresponding recall or sensitivity values. While
the baseline is fixed with ROC, the baseline of PRC is determined as P / (P + N). The area under the PR
curve, denoted as AUC (PRC), is a better indicator for multiple classifier comparisons in the class-
imbalanced datasets [14]. The cost curve (CC) is an alternative to the ROC plot, and they analyze
classification performance by varying operating points, which are based on class probabilities and
misclassification costs [14]. The probability cost function or PCF represents the operating points on the x-
axis, and the normalized expected cost or NE[C] accounts for the classification performance on the y-axis.
2.4. Proposed Solution
The authors use the open source WEKA tool to create classifiers using both the algorithmic
approach as well as the sampling approach. The authors picked WEKA, because of its popularity among
researchers [25]. WEKA is a freely available, Java-based collection of many data mining implementations
and visualization tools. Its easy to use GUI interface is better suited for non-technical users like the health
care policy makers. Since the software is open-source, any researcher can modify the source and repeat
experiments to compare results. A cost-benefit analysis using WEKA was done, and the classifier
performance was compared to identify the best classifiers for the current class imbalanced health dataset. In
the classifiers defined for prediction of chronic health conditions, we are specifically interested in reducing
the false negatives because it has a higher cost. The classifier should be able to predict a significant number
of the minority or target class instances. Once the core classifiers are studied, the authors attempt to improve
the performance of the classifiers using an ensemble of classifiers and also data sampling techniques.
2. RESEARCH METHOD
The data for this experiment has been provided by the Rural Maternity and Child Welfare Homes
(RMCWH) organization, which is the largest private integrated health care delivery network in Karnataka.
RMCWHs are manned by the Department of Community Medicine, Kasturba Medical College, Manipal,
India. The dataset has a total of 22,598 instances and 53 attributes. The predictor variables in the dataset
record the patient‟s demographics, family details, socioeconomic status, and living conditions. The class
attribute is a binary attribute which indicates if the patient has one or more of the following chronic diseases:
diabetes, heart disease or high blood pressure. The class is imbalanced with 1311 patients with chronic illness
and 20982 healthy patients. This dataset implies a rare case, wherein 5.8% of the total population is the rare
positive case. The dataset is also unique because it contains 305 instances with missing class labels. The
dataset contains an almost equal number of male and female records. The chronic disease was found in
patients who are above the age of 40. The attributes which are highly correlated with the chronic illness
occurrence are age, gender and marital status of the patient. The patients with chronic diseases were also
found to be from the higher income group.
Based on literature review, the authors selected a subset of WEKA classifiers that are known to
work well in the class-imbalanced datasets [25]. Classifiers which work with missing class values were
chosen due to a large number of missing values in the health dataset. In the case of the chronic health dataset,
the costs of FNs is much more that the cost of FPs. Although it is possible to compute the cost of the FP
regarding the cost of diagnostic tests, the cost of late diagnosis and death cannot be easily quantified. The
authors chose to represent the WEKA cost matrix in the ratio of 1:10, i.e., The cost of FN is ten times more
than the cost of the FP. The widely used stratified 10-fold cross-validation was deployed for the testing of the
classifiers, due to its relatively low bias and variance [7], [17]. The core classifiers were compared in terms
of total cost and true positive rate. Cost benefit analysis was done with the results of basic classifiers, and the
cost function was minimized so as to lower total costs in general as well as reduce the total number of FNs in
the classifier.
After the best core classifiers had been identified, the authors conducted experiments to check if cost
sensitive learning, filtered classifiers, and ensemble methods could be used to improve the results. WEKA
supports ensemble-based classification: boosting, bagging and blending. Boosting was done with the
AdaBoostM1 with different base classifiers to see if their results could be improved. Bagging with various
base classifiers was performed to see if it results enhanced by the separation of data into samples. Blending
was conducted using Stacking in WEKA which is based on the Stacked Aggregation method using a diverse
5. IJECE ISSN: 2088-8708
Learning from a Class Imbalanced Public Health Dataset: a Cost-based Comparison of …. (Rohini R. Rao)
2219
blend of algorithms. The choice of base classifiers for the ensemble was based on the assumption that base
classifiers are independent of each other and that the base classifiers perform better than random guessing.
In the last experiment, the datasets were modified using under-sampling, over-sampling and
SMOTE techniques to see their impact on the performance of classifiers. Each of these sampling techniques
ensured that both class labels are balanced, using the WEKA filters “Resample,” “SpreadSubSample” and
“SMOTE.” In the first strategy, the “SpreadSubSample” filter which produces a random subsample of a
dataset was used. This filter performs under-sampling to ensure a uniform distribution of classes, which
resulted in a dataset with 1074 positive instances and 1311 negative instances. In the second strategy, the
“Resample” filter was used, with the “biasToUniformDistribution” option to get an over-sampled dataset
with replacement. The over-sampled dataset resulted in a dataset with 11299 positive class instances and
11140 negative class instances. The SMOTE filter was also used to resample the dataset using five nearest
neighbors to generate 14421 positive instances and 20982 negative instances. The three datasets were then
used to produce classifiers using different classifier methods, and the results were ranked based on cost.
3. RESULTS AND DISCUSSION
In the first stage, core classifiers were built, and its performance for the class-imbalanced dataset
was analyzed. The data also contains missing values, and only those classifiers which support missing class
values were evaluated for their performance. The Support Vector Machine based WEKA implementation
LIbSVM, produced an effective classifier with the sigmoid kernel while other kernels like linear and radial
resulted in “chance” classifiers which were equivalent to a random guess. This also implies that the solution
space is not linearly separable. Chance classifiers were eliminated, and the remaining classifiers were
shortlisted and ranked based on the total cost of the classifier (see Table 1). The „Total Cost‟ was calculated
based on the cost matrix; the costs were further reduced by performing a cost-benefit analysis to ensure a
minimum number of FNs. The „Total Cost (Optimized)‟ represents the costs after a cost-benefit analysis.
Fewer FNs (which results in less cost) and high recall is desirable. The total number of FNs before and after
cost-benefit analysis were tabulated. The AUC values in column 8 prove that all these classifiers are not
equivalent to random guess but can classify the data in spite of class imbalance.
Table 1. Costs and performance of core classifiers (Top 10)
Rank WEKA
Classifier
Total
Cost
Total Cost
(Optimized)
Recall FNs
(out of 1311)
FNs
(Optimized )
Accurac
y Rate
(%)
AUC AUC
(PRC)
1 Bayesian
Net
5504 4953 0.73 357 174 89.72 0.925 0.379
2 Naïve
Bayesian
5602 4988 0.72 367 143 89.69 0.924 0.379
3 Logistic 10480 4928 0.22 1021 172 94.21 0.924 0.387
4 Random
Tree
10707 9662 0.24 993 806 92.06 0.657 0.142
5 J48 11137 7467 0.17 1091 528 94.08 0.836 0.286
6 VotedPerce
ptron
11488 11327 0.14 1127 1105 93.97 0.572 0.117
7 JRIP 11447 11468 0.14 1122 1122 93.95 0.576 0.128
8 LibSVM –
Sigmoid
kernel
11945 11971 0.12 1153 1311 92.97 0.550 0.083
9 SimpleCart 12054 9158 0.09 1194 743 94.13 0.768 0.219
10 IBK, K=5 12548 10029 0.05 1249 577 94.14 0.721 0.178
In general, most of the top classifiers exhibited low total cost, and the number of FNs was
drastically reduced using the cost-benefit analysis. The Bayesian classifiers had the best performance
regarding the cost of the core classifiers in the class imbalanced health dataset (Table 1). The WEKA based
paired t-test proved that there is no difference in the performance of the Naïve Bayesian and Bayesian net
classifiers. The Bayesian net classifier is preferred because initial exploration shows a strong correlation
among the patient features. Bayesian classifiers are known to work well in situations like medical diagnosis,
wherein the relationship between the attribute set and class variable is non-deterministic. Bayesian classifiers
are also robust to noise, irrelevant attributes and confounding factors that are not included in the
classification. All the other algorithms like Logistic, Random Tree and Voted Perceptron have a high number
of false negatives which drastically increases the cost of the classifier. The results also contradict the results
of Weiss [12] who advocated the use of instance based learners for the class imbalance problem. The rule-
6. ISSN: 2088-8708
IJECE Vol. 7, No. 4, August 2017 : 2215 – 2222
2220
based classifier, JRIP which is an implementation of a propositional rule learner, is well suited for handling
class imbalances and appears in the top 10 classifiers. This result is in line with previous results which
suggested that kernel based SVMS work better in class imbalance problems [12], [13], [15]. Even though the
classifier accuracy for all classifiers is high, between 89% and 94 %, only the Bayesian classifiers, have a
high recall (0.73) and low number of FNs.
In the second experiment, the effect of ensembling methods like Random Forest, Boosting, Bagging,
Stacking and Voting on these baseline classifiers was studied. The results are tabulated in Table 2. The cost
sensitive learning and the meta cost implementations in WEKA were also evaluated. The classifier
performance was again ranked based on total cost which was further optimized using cost-benefit analysis
and tabulated in Table 2.
Table 2. Costs and performance of Ensemble & Cost based Classifiers (Top 10)
Rank Classifier Total
Cost
Total cost
(optimized)
Recall FNs
(out of
1311)
FNs
(optimized )
Accuracy
Rate
(%)
AUC AUC
(PRC)
1 Cost Sensitive
(BayesNet)
4968 5007 0.90 137 169 83.25 0.924 0.378
2 Filtered class,
Class Balancer,
(BayesNet)
5150 5000 0.91 117 169 81.62 0.924 0.378
3 Cost sensitive
(JRIP)
5281 5448 0.85 197 214 84.26 0.857 0.262
4 MetaCost
(BayesNet)
5426 5185 0.93 95 192 79.49 0.919 0.355
5 Cost sensitive
(Logistic)
5507 5607 0.85 196 223 83.21 0.904 0.312
6 Bagging
(BayesNet)
5575 4969 0.72 368 171 89.85 0.924 0.380
7 Filtered class,
Class Balancer,
(Logistic)
5618 5547 0.88 153 211 80.98 0.907 0.320
8 Filtered class
Class Balancer,
(J48)
6240 6287 0.70 399 403 88.12 0.797 0.231
9 Vote
(BayesNet with
Logistic)
6443 4936 0.60 520 191 92.09 0.928 0.394
10 Vote
(BayesNet with
random forest)
6543 4836 0.59 534 154 92.20 0.927 0.400
The cost sensitive learning implementation with different core classifiers exhibited the best
performance. In the case of the JRIP and Logistic classifiers, the cost-sensitive learning approach almost
halves the total cost of the core implementation. The filtered classifier with the class balancer filter produces
good results with Logistic and J48 methods which were previously affected by the class imbalance. However,
the cost-benefit analysis sometimes led to an increase in overall cost even though the number of FNs were
low. The increase in total cost was due to a massive increase in FPs as a result of threshold moving. As
indicated in the literature, the ensemble methods like Voting and ADABoostM1 significantly increase the
performance of classifiers in imbalanced class datasets in comparison to core classifiers [19].
In the third experiment, the effect of sampling to balance the classes was done using techniques like
under-sampling, oversampling and SMOTE (Table 3). The results were ranked based on total cost, and a
cost-benefit analysis was performed to see if costs could be reduced. In general, as indicated in the literature,
under-sampling seems to work better than over-sampling and SMOTE [22]. The authors recommend the
usage of random under-sampling as a solution for class imbalanced datasets because it is also
computationally less expensive to implement than SMOTE or over-sampling. It also reduces the size of the
dataset, which will improve time complexity without sacrificing classification performance. In the case of the
J48 and IBK, it was observed that all three sampling strategies improved the cost dramatically. The sampling
results indicate that J48 and IBK work better in class balanced datasets. This result contradicts some previous
empirical results; the sampling methods outperformed the cost-sensitive methods [11].
7. IJECE ISSN: 2088-8708
Learning from a Class Imbalanced Public Health Dataset: a Cost-based Comparison of …. (Rohini R. Rao)
2221
Table 3. Costs and performance of Classifiers using sampling techniques (Top 10)
Ra
nk
Classifier Total
Cost
Total cost
(optimize
d)
Recall Accurac
y Rate
(%)
FNs FNs
(optim
ized)
AUC AUC
(PRC)
1 Oversampling
Random Tree
766 530 1.00 96.58 0/11299 5 0.979 0.959
2 UnderSampling
JRIP
1198 1349 0.92 84.10 91/1074 68 0.841 0.711
3 UnderSampling
Bayesnet
1215 756 0.91 86.04 98/1074 11 0.886 0.777
4 OverSampling
J48
1265 1292 0.99 95.20 21/1074 14 0.963 0.934
5 UnderSampling
J48
1455 1170 0.89 83.89 119/1074 65 0.838 0.693
UnderSampling
VotedPerceptron
1533 1039 0.88 85.53 132/1074 56 0.847 0.709
6 UnderSampling
Logistic
1682 804 0.86 84.19 145/1074 11 0.875 0.748
7 UnderSampling
IBK k=5
2205 1096 0.82 78.49 188/1074 9 0.831 0.709
8 UnderSampling
Random Tree
3271 1548 0.72 75.30 298/1074 0 0.750 0.611
9 UnderSampling
LIbSVM
(sigmoid kernel)
3528 1431 0.71 70.57 314/1074 105 0.687 0.541
10 Oversampling
IBK k5
5033 3991 0.99
83.95
159/1129
9
956 0.962 0.938
4. CONCLUSION
This work is relevant to public health policy makers, who can use the classifiers to predict the
occurrence of chronic disease in the population and also identify the factors that are correlated with chronic
diseases. The classifiers will help health care providers in improving their prognosis, diagnosis and treatment
plans. Experiments were conducted based on various approaches suggested in the literature, to tackle the
class imbalance problem. The WEKA based classifiers were used to record and analyze the classifier
performance in terms of cost. The Bayesian classifiers were identified as the best classifiers for the class
imbalanced dataset. The authors recommend the Bayesian Net classifier because of underlying correlation
among patient features. The cost sensitive implementations and cost-benefit analysis can further reduce the
total cost while maintaining the accuracy. However, the ensemble methods is a complex solution wherein
there are a huge number of solutions that still needs to be explored. Under-sampling is an efficient data pre-
processing approach with low computation costs, and it is recommended for building cost effective
classifiers. The under-sampling can dramatically improve the performance of methods like J48, IBK which
are affected by the class imbalance. The current work assumes that the cost of FNs is ten times more than the
cost of TPs. The work can be improved by actually quantifying the actual costs in generating classifier errors.
In future work, the effect of feature selection on the classifier cost will be studied. Though irrelevant features
are not known to improve classification performance significantly, they can slow down the classifier process.
ACKNOWLEDGEMENT
We thank Dr. Harishchandra Hebbar, Professor, School of Information Sciences, Manipal and the
Department of Community Medicine, KMC, Manipal for sharing with us valuable data. We thank Dr. Veena
Kamath, Professor, Department of Community Medicine, KMC, Manipal, for extending us her subject
expertise.
REFERENCES
[1] Tomar D, Agarwal S. “A survey on Data Mining approaches for Healthcare”, International Journal of Bio-Science
and Bio-Technology. 2013; 5(5): 241-266.
8. ISSN: 2088-8708
IJECE Vol. 7, No. 4, August 2017 : 2215 – 2222
2222
[2] Dissanayaka C, Abdullah H, Ahmed B, Penzel T, Cvetkovic D. “Classification of Healthy Subjects and Insomniac
Patients Based on Automated Sleep Onset Detection”. In International Conference for Innovation in Biomedical
Engineering and Life Sciences: ICIBEL2015; 2015; Putrajaya, Malaysia.
[3] Chien C, Pottie GJ, “A Universal Hybrid Decision Tree Classifier Design for Human”, In 34th Annual International
Conference of the IEEE EMBS; 2012; San Diego, USA.
[4] Wang Y, Li Pf, Tian Y, Ren Jj, Li Js, “A Shared Decision Making System for Diabetes Medication Choice Utilizing
Electronic Health Record Data”, EEE Journal of Biomedical and Health Informatics. 2016; pp(99):1-1.
[5] Konda S, Balmuri KR, Basireddy RR, Mogili R, “Hybrid Approach for Prediction of Cardiovascular Disease Using
Class Association Rules and MLP”, International Journal of Electrical and Computer Engineering. 2016;
6(4):1800.
[6] Boris Milovic, Milan Milovic, “Prediction and Decision Making in Health Care using Data Mining”, International
Journal of Public Health Science, December 2012; 1(2): 69-78.
[7] Pollettini JT, Panico SRG, Daneluzzi JC, Tinós R, Baranauskas JA, Macedo AA, “Using Machine Learning
Classifiers to Assist Healthcare-Related Decisions: Classification of Electronic Patient Records”, Journal of Medical
Systems, 2012; 36: 3861-3874.
[8] Herland M, Khoshgoftaar TM, Wald R, “A review of data mining using big data in health informatics”, Journal of
Big Data, 2014; 1:2.
[9] Takeda F, Tamiya N, Noguchi H, Monma T, “Relation between Mental Health Status and Psychosocial Stressors
among Pregnant and Puerperium Women in Japan: From the Perspective of Working Status”, International Journal
of Public Health Science, 2012; 1(2): 37-48.
[10] Milovic B, Milovic M, “Prediction and Decision Making in Health Care using Data Mining”, International Journal
of Public Health Science, 2012; 1(2): 69-78.
[11] Chawla NV. Data Mining for Imbalanced Datasets: an Overview. In Data Mining and Knowledge Discovery
Handbook, Springer US; 2005, 853-867.
[12] Weiss GM., “Mining with Rarity: A Unifying Framework, ACM SIGKDD Explorations Newsletter - Special issue
on learning from imbalanced datasets”, June 2004; 6(1):7-19.
[13] Japkowicz N, “The Class Imbalance Problem: Significance and Strategies”, In the 2000 International Conference
on Artificial Intelligence [ICAI]; 2000; Las Vegas, USA.
[14] Saito T, Rehmsmeier M, “The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating
Binary Classifiers on Imbalanced Datasets”, PLoS ONE 10(3): e0118432. doi:10.1371/journal.pone.0118432
[15] Lusa L, Blagus R, “The class-imbalance problem for high-dimensional class prediction”, in 2012 11th International
Conference on Machine Learning and Applications; 2012.
[16] Hempstalk K, Frank E, Witten IH, “One-class Classification by Combining Density and Class Probability
Estimation”, Machine Learning and Knowledge Discovery in Databases, 2008; 5211: 505-519.
[17] Tan Pn, Steinbach M, Kumar V. Introduction to Data Mining: Pearson Publication; 2014.
[18] Joshi MV, Agarwal RC, Kumar V, “Mining Needles in a Haystack: Classifying Rare Classes via Two-Phase Rule
Induction”, in the 2001 ACM SIGMOD international conference on Management of data; 2001; New York, USA.
[19] Joshi MV, Agarwal RC, Kumar V, “Predicting Rare Classes: Can Boosting Make Any Weak Learner Strong?”, in
the eighth ACM SIGKDD international conference on Knowledge discovery and data mining; 2002; New York ,
USA.
[20] Joshi MV, Kumar V, “CREDOS: Classification using Ripple down Structure [A Case for Rare Classes]”, In the
2004 SIAM International Conference on Data Mining; 2004; Florida, USA.
[21] Dittman DJ, Khoshgoftaar TM, RandallWald , Napolitano A, “Comparison of Data Sampling Approaches for
Imbalanced Bioinformatics Data”, In the Twenty-Seventh International Florida Artificial Intelligence Research
Society Conference; 2014; Florida.
[22] Blagus R, Lusa L, “SMOTE for high-dimensional class-imbalanced data”, BMC Bioinformatics, 2013; 14:106.
[23] Davis J, Goadrich M, “The Relationship between Precision-Recall and ROC Curves”, In 27 rd International
Conference on Machine Learning; 2006; Pittsburgh, USA.
[24] Janez Demsar, “Statistical Comparisons of Classifiers over Multiple Data Sets”, Journal of Machine Learning
Research, 2006; 7: 1-30.
[25] Frank E, Hall MA, Witten IH, “The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine
Learning Tools and Techniques”, Morgan Kaufmann; 2016 [cited 2016 May 01, Available from:
http://www.cs.waikato.ac.nz/ml/weka/.