This document discusses examining the effect of feature selection on improving patient deterioration prediction in intensive care units. The authors apply feature selection techniques to laboratory test data from the MIMIC-II database to identify the most important laboratory tests for predicting patient deterioration. They find that feature selection can help reduce redundant tests, potentially saving costs and allowing earlier treatment. The selected features provide insights into critical tests without domain expertise. In future work, the authors plan to evaluate additional feature selection methods and classification algorithms on this task.
ICU Patient Deterioration Prediction : A Data-Mining Approachcsandit
A huge amount of medical data is generated every da
y, which presents a challenge in analysing
these data. The obvious solution to this challenge
is to reduce the amount of data without
information loss. Dimension reduction is considered
the most popular approach for reducing
data size and also to reduce noise and redundancies
in data. In this paper, we investigate the
effect of feature selection in improving the predic
tion of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a su
bset of features would mean choosing the
most important lab tests to perform. If the number
of tests can be reduced by identifying the
most important tests, then we could also identify t
he redundant tests. By omitting the redundant
tests, observation time could be reduced and early
treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be av
oided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deteri
oration using the medical lab results. We
apply our technique on the publicly available MIMIC
-II database and show the effectiveness of
the feature selection. We also provide a detailed a
nalysis of the best features identified by our
approach.
Using the Bigtown Simulation Model to Predict the Impact of Enhanced Seven Day Services on Hospital Performance and Patient Outcomes
Poster from the 'Delivering NHS services, seven days a week' event held in Birmingham on 16 November 2013
More information about this event can be found at
http://www.nhsiq.nhs.uk/news-events/events/nhs-services-seven-days-a-week.aspx
IMPACT OF HEALTH INFORMATICS TECHNOLOGY ON THE IMPLEMENTATION OF A MODIFIED E...hiij
The Modified Early Warning System (MEWS) is based on a patient score that helps the medical team monitor patients to identify a patient that may be experiencing a sudden decline in care. This study consists of a detailed review of clinical data and patient outcomes to assess impact of technology and patient care.There are a total of thirteen hospitals included in this review. These facilities have implemented vitals capture and the MEWS scoring system.
Are laboratory tests always needed frequency and causes of laboratory overu...Hossamaldin Alzawawi
This article is discussing the importance of monitoring clinical laboratory resource utilization and how the team has implemented a monitor system to assess clinical laboratory resource overuse.
The Paperless partograph – The new user-friendly and simpler tool for monitor...iosrjce
IOSR Journal of Dental and Medical Sciences is one of the speciality Journal in Dental Science and Medical Science published by International Organization of Scientific Research (IOSR). The Journal publishes papers of the highest scientific merit and widest possible scope work in all areas related to medical and dental science. The Journal welcome review articles, leading medical and clinical research articles, technical notes, case reports and others.
Capturing Patient-Reported Outcome (PRO) Data Electronically: The Past, Prese...CRF Health
Patient-reported outcomes (PROs) are an
important means of evaluating the treatment benefit of
new medical products. It is recognized that PRO measures
should be used when assessing concepts best
known by the patient or best measured from the patient’s
perspective. As a result, there is growing emphasis on
well defined and reliable PRO measures. In addition,
advances in technology have significantly increased
electronic PRO (ePRO) data collection capabilities and
options in clinical trials. The movement from paperbased
to ePRO data capture has enhanced the integrity
and accuracy of clinical trial data and is encouraged by
regulators. A primary distinction in the types of ePRO
platforms is between telephone-based interactive voice
response systems and screen-based systems. Handheld
touchscreen-based devices have become the mainstay for
remote (i.e., off-site, unsupervised) PRO data collection
in clinical trials. The conventional approach is to provide
study subjects with a handheld device with a devicebased
proprietary software program. However, an
emerging alternative for clinical trials is called bring
your own device (BYOD). Leveraging study subjects’
own Internet-enabled mobile devices for remote PRO
data collection (via a downloadable app or a Web-based
data collection portal) has become possible due to the
widespread use of personal smartphones and tablets.
However, there are a number of scientific and operational
issues that must be addressed before BYOD can be
routinely considered as a practical alternative to conventional
ePRO data collection methods. Nevertheless,
the future for ePRO data collection is bright and the
promise of BYOD opens a new chapter in its evolution.
ICU Patient Deterioration Prediction : A Data-Mining Approachcsandit
A huge amount of medical data is generated every da
y, which presents a challenge in analysing
these data. The obvious solution to this challenge
is to reduce the amount of data without
information loss. Dimension reduction is considered
the most popular approach for reducing
data size and also to reduce noise and redundancies
in data. In this paper, we investigate the
effect of feature selection in improving the predic
tion of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a su
bset of features would mean choosing the
most important lab tests to perform. If the number
of tests can be reduced by identifying the
most important tests, then we could also identify t
he redundant tests. By omitting the redundant
tests, observation time could be reduced and early
treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be av
oided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deteri
oration using the medical lab results. We
apply our technique on the publicly available MIMIC
-II database and show the effectiveness of
the feature selection. We also provide a detailed a
nalysis of the best features identified by our
approach.
Using the Bigtown Simulation Model to Predict the Impact of Enhanced Seven Day Services on Hospital Performance and Patient Outcomes
Poster from the 'Delivering NHS services, seven days a week' event held in Birmingham on 16 November 2013
More information about this event can be found at
http://www.nhsiq.nhs.uk/news-events/events/nhs-services-seven-days-a-week.aspx
IMPACT OF HEALTH INFORMATICS TECHNOLOGY ON THE IMPLEMENTATION OF A MODIFIED E...hiij
The Modified Early Warning System (MEWS) is based on a patient score that helps the medical team monitor patients to identify a patient that may be experiencing a sudden decline in care. This study consists of a detailed review of clinical data and patient outcomes to assess impact of technology and patient care.There are a total of thirteen hospitals included in this review. These facilities have implemented vitals capture and the MEWS scoring system.
Are laboratory tests always needed frequency and causes of laboratory overu...Hossamaldin Alzawawi
This article is discussing the importance of monitoring clinical laboratory resource utilization and how the team has implemented a monitor system to assess clinical laboratory resource overuse.
The Paperless partograph – The new user-friendly and simpler tool for monitor...iosrjce
IOSR Journal of Dental and Medical Sciences is one of the speciality Journal in Dental Science and Medical Science published by International Organization of Scientific Research (IOSR). The Journal publishes papers of the highest scientific merit and widest possible scope work in all areas related to medical and dental science. The Journal welcome review articles, leading medical and clinical research articles, technical notes, case reports and others.
Capturing Patient-Reported Outcome (PRO) Data Electronically: The Past, Prese...CRF Health
Patient-reported outcomes (PROs) are an
important means of evaluating the treatment benefit of
new medical products. It is recognized that PRO measures
should be used when assessing concepts best
known by the patient or best measured from the patient’s
perspective. As a result, there is growing emphasis on
well defined and reliable PRO measures. In addition,
advances in technology have significantly increased
electronic PRO (ePRO) data collection capabilities and
options in clinical trials. The movement from paperbased
to ePRO data capture has enhanced the integrity
and accuracy of clinical trial data and is encouraged by
regulators. A primary distinction in the types of ePRO
platforms is between telephone-based interactive voice
response systems and screen-based systems. Handheld
touchscreen-based devices have become the mainstay for
remote (i.e., off-site, unsupervised) PRO data collection
in clinical trials. The conventional approach is to provide
study subjects with a handheld device with a devicebased
proprietary software program. However, an
emerging alternative for clinical trials is called bring
your own device (BYOD). Leveraging study subjects’
own Internet-enabled mobile devices for remote PRO
data collection (via a downloadable app or a Web-based
data collection portal) has become possible due to the
widespread use of personal smartphones and tablets.
However, there are a number of scientific and operational
issues that must be addressed before BYOD can be
routinely considered as a practical alternative to conventional
ePRO data collection methods. Nevertheless,
the future for ePRO data collection is bright and the
promise of BYOD opens a new chapter in its evolution.
An excellent article that uses predictive and optimization methods to reduce hospital readmissions.
Another great article, "Reducing hospital readmissions by integrating empirical prediction with resource optimization" (Helm, Alaeddini, Stauffer, Bretthaur, and Skolarus, 2016) describes how Machine Learning modeling tools were used to determine the root-causes and individualized estimation of readmissions. The post-discharge monitoring schedule and workplans were then optimized to patient changes in health states.
Diagnosis of rheumatoid arthritis using an ensemble learning approachcsandit
Rheumatoid arthritis is one of the diseases that it
s cause is unknown yet; exploring the field of
medical data mining can be helpful in early diagnos
is and treatment of the disease. In this
study, a predictive model is suggested that diagnos
es rheumatoid arthritis. The rheumatoid
arthritis dataset was collected from 2,564 patients
referred to rheumatology clinic. For each
patient a record consists of several clinical and d
emographic features is saved. After data
analysis and pre-processing operations, three diffe
rent methods are combined to choose proper
features among all the features. Various data class
ification algorithms were applied on these
features. Among these algorithms Adaboost had the h
ighest precision. In this paper, we
proposed a new classification algorithm entitled CS
-Boost that employs Cuckoo search
algorithm for optimizing the performance of Adaboos
t algorithm. Experimental results show
that the CS-Boost algorithm enhance the accuracy of
Adaboost in predicting of Rheumatoid
Arthritis.
Background Hospital contributes significantly tangible and intangible resources on a concurred plan by the scheduling of surgery on the OT list. Postponement decreases efficiency by declining throughput leads to wastage of resources hence burden to the nation. Patients and their family face economic and emotional implication due to the postponement. Postponement rate being a quality indicator controls check mechanism could be developed from the results. Postponement of elective scheduled operations results in inefficient use of the operating room (OR) time on the day of surgery. Inconvenience to patients and families are also caused by postponements. Moreover, the day of surgery (DOS) postponement creates logistic and financial burden associated with extended hospital stay and repetitions of pre-operative preparations to an extent of repetition of investigations in some cases causing escalated costs, wastage of time and reduced income. Methodology A cross-sectional study was done in the operation theaters of a tertiary care hospital in which total ten operation theaters of General Surgery Data of scheduled, performed and postponed surgeries was collected from all the operation theater with effect from March 1st to September 30th, 2018. A questionnaire was developed to find out the reasons for the postponement for all hospital’s stakeholders (surgeons, Anesthetist, Nursing Officer) and they were further evaluated time series analysis of scheduling of Operation Theater for moving average technique. Results Total 958 surgeries were scheduled and 772 surgeries performed were and 186 surgeries were postponed with a postponement rate of 19.42% in the cardiac surgery department during the study period. Month-wise postponement Rate exponential smoothing of time series data shows the dynamic of operating suits. To test throughput Postponement rate was plotted the postponed surgeries and on regression analysis is in a perfect linear relationship.
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...Data Con LA
Medical institutions, universities and software giants like Google and Microsoft are dedicating increasing resources to machine learning for healthcare. This is a very exciting but relatively young field. However, best practices for methods and reporting of results are not yet fully established. I have 2.5 years of experience as data scientist at a national cancer center working on clinical data, evaluating external vendors and peer reviewing machine learning in healthcare papers. The talk gives an overview of best practices in prototyping machine learning models on data from the patient electronic health record (EHR). The topics addressed are:1. Introduction to the EHR2. Overview of machine learning applications to the EHR3. Cohort definition for survival problems4. Data cleaning5. Performance metricsExcerpts of papers from renowned institutions will be critically reviewed. The material is intended to be useful not only to machine learning for healthcare professionals, but to practitioners dealing with very unbalanced dataset in the temporal domain. For example, customer churn prediction can be modeled as survival problem.
Strategies for Considerations Requirement Sample Size in Different Clinical T...IJMREMJournal
-------------------------------------------------------ABSTRACT ---------------------------------------------------
Usually the main problem face any investigation it how to determent a sample size, however, some
considerations required in sample size to conduct the efficacy and make realistic well-researched before began
study. This study aimed to determine the maximum possible sample size at different phases of clinical trials and
attempt to achieve the best accuracy of the results. To achieve that the maximum sample size in different phases
we found that the maximum sample size of phase I was (75) relies on largest response rate 20% and the minimal
clinically important difference (MCID) 15%, and because the participants are healthy often that means 15%
enough to show positive results of the transition to the second phase. for the phase II clinical trials; the
maximum sample size was (388) depend on the error 5% and largest response rate 50% when the response rate
should not be less than 20% according to the design used in this phase. Depend on the endpoint and hazard
ratio in phase III clinical trials when the probability of survival of the treatment group equal to median of the
probability of survival 50% we found that the maximum sample size (4796). For the phase IV the maximum
sample size in different phases of clinical trials does not affect whatever the large of the population size and
remains constant as large as possible size.
In today’s world there is a wide availability of huge amount of data and thus there is a need for turning this
data into useful information which is referred to as knowledge. This demand for knowledge discovery
process has led to the development of many algorithms used to determine the association rules. One of the
major problems faced by these algorithms is generation of candidate sets. The FP-Tree algorithm is one of
the most preferred algorithms for association rule mining because it gives association rules without
generating candidate sets. But in the process of doing so, it generates many CP-trees which decreases its
efficiency. In this research paper, an improvised FP-tree algorithm with a modified header table, along
with a spare table and the MFI algorithm for association rule mining is proposed. This algorithm generates
frequent item sets without using candidate sets and CP-trees.
UNDERSTANDING CUSTOMERS' EVALUATIONS THROUGH MINING AIRLINE REVIEWSIJDKP
Data mining can be evaluated as a strategic tool to determine the customer profiles in order to learn
customer expectations and requirements. Airline customers have different characteristics and if passenger
reviews about their trip experiences are correctly analyzed, companies can increase customer satisfaction
by improving provided services. In this study, we investigate customer review data for in-flight services of
airline companies and draw customer models with respect to such data. In this sense, we apply two
approaches as feature-based and clustering-based modelling. In feature-based modelling, customers are
grouped into categories based on features such as cabin flown types, experienced airline companies. In
clustering-based modelling, customers are first clustered via k-means clustering and then modeled. We
apply multivariate regression analysis to model customer groups in both cases. During this, we try to
understand how customers evaluate the given services and what dominant characteristics of in-flight
services can be from the customer viewpoint.
With ever increasing number of documents on web and other repositories, the task of organizing and
categorizing these documents to the diverse need of the user by manual means is a complicated job, hence
a machine learning technique named clustering is very useful. Text documents are clustered by pair wise
similarity of documents with similarity measures like Cosine, Jaccard or Pearson. Best clustering results
are seen when overlapping of terms in documents is less, that is, when clusters are distinguishable. Hence
for this problem, to find document similarity we apply link and neighbor introduced in ROCK. Link
specifies number of shared neighbors of a pair of documents. Significantly similar documents are called as
neighbors. This work applies links and neighbors to Bisecting K-means clustering in identifying seed
documents in the dataset, as a heuristic measure in choosing a cluster to be partitioned and as a means to
find the number of partitions possible in the dataset. Our experiments on real-time datasets showed a
significant improvement in terms of accuracy with minimum time.
A predictive system for detection of bankruptcy using machine learning techni...IJDKP
Bankruptcy is a legal procedure that claims a person or organization as a debtor. It is essential to
ascertain the risk of bankruptcy at initial stages to prevent financial losses. In this perspective, different
soft computing techniques can be employed to ascertain bankruptcy. This study proposes a bankruptcy
prediction system to categorize the companies based on extent of risk. The prediction system acts as a
decision support tool for detection of bankruptcy
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
Arabic words stemming approach using arabic wordnetIJDKP
The big growth of the Arabic internet content in the last years has raised up the need for an effective
stemming techniques for Arabic language. Arabic stemming algorithms can be ranked, according to three
category, as root-based approach (ex. Khoja); stem-based approach (ex. Larkey); and statistical approach
(ex. N-Garm). However, no stemming of this language is perfect: The existing stemmers have a low
efficiency. In this paper, we introduce a new stemming technique for Arabic words that also solve the
problem of the plural form of irregular nouns in Arabic language, which called broken plural. The
proposed stem extractor provides very accurate results in comparisons with other algorithms.
Consequently the search effectiveness improved.
The aim of this paper is to use Text mining(TM) concepts in the field of Health care System. We no that now days decision making in health care involves number of opinions given by the group of medical experts for specific disease in the form of decisions which will be presented in medical database in the form of text. These decisions are then mined from database with the help of Data Mining techniques. Text document clustering is considered as tool for performing information based operations. For clustering normally K-means clustering technique is used. In this paper we use Bisecting K-means clustering technique and it is better compared to traditional K-means technique. The objective is to study the revealed
groupings of similar opinion-types associated with the likelihood of physicians and medical experts.
An excellent article that uses predictive and optimization methods to reduce hospital readmissions.
Another great article, "Reducing hospital readmissions by integrating empirical prediction with resource optimization" (Helm, Alaeddini, Stauffer, Bretthaur, and Skolarus, 2016) describes how Machine Learning modeling tools were used to determine the root-causes and individualized estimation of readmissions. The post-discharge monitoring schedule and workplans were then optimized to patient changes in health states.
Diagnosis of rheumatoid arthritis using an ensemble learning approachcsandit
Rheumatoid arthritis is one of the diseases that it
s cause is unknown yet; exploring the field of
medical data mining can be helpful in early diagnos
is and treatment of the disease. In this
study, a predictive model is suggested that diagnos
es rheumatoid arthritis. The rheumatoid
arthritis dataset was collected from 2,564 patients
referred to rheumatology clinic. For each
patient a record consists of several clinical and d
emographic features is saved. After data
analysis and pre-processing operations, three diffe
rent methods are combined to choose proper
features among all the features. Various data class
ification algorithms were applied on these
features. Among these algorithms Adaboost had the h
ighest precision. In this paper, we
proposed a new classification algorithm entitled CS
-Boost that employs Cuckoo search
algorithm for optimizing the performance of Adaboos
t algorithm. Experimental results show
that the CS-Boost algorithm enhance the accuracy of
Adaboost in predicting of Rheumatoid
Arthritis.
Background Hospital contributes significantly tangible and intangible resources on a concurred plan by the scheduling of surgery on the OT list. Postponement decreases efficiency by declining throughput leads to wastage of resources hence burden to the nation. Patients and their family face economic and emotional implication due to the postponement. Postponement rate being a quality indicator controls check mechanism could be developed from the results. Postponement of elective scheduled operations results in inefficient use of the operating room (OR) time on the day of surgery. Inconvenience to patients and families are also caused by postponements. Moreover, the day of surgery (DOS) postponement creates logistic and financial burden associated with extended hospital stay and repetitions of pre-operative preparations to an extent of repetition of investigations in some cases causing escalated costs, wastage of time and reduced income. Methodology A cross-sectional study was done in the operation theaters of a tertiary care hospital in which total ten operation theaters of General Surgery Data of scheduled, performed and postponed surgeries was collected from all the operation theater with effect from March 1st to September 30th, 2018. A questionnaire was developed to find out the reasons for the postponement for all hospital’s stakeholders (surgeons, Anesthetist, Nursing Officer) and they were further evaluated time series analysis of scheduling of Operation Theater for moving average technique. Results Total 958 surgeries were scheduled and 772 surgeries performed were and 186 surgeries were postponed with a postponement rate of 19.42% in the cardiac surgery department during the study period. Month-wise postponement Rate exponential smoothing of time series data shows the dynamic of operating suits. To test throughput Postponement rate was plotted the postponed surgeries and on regression analysis is in a perfect linear relationship.
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...Data Con LA
Medical institutions, universities and software giants like Google and Microsoft are dedicating increasing resources to machine learning for healthcare. This is a very exciting but relatively young field. However, best practices for methods and reporting of results are not yet fully established. I have 2.5 years of experience as data scientist at a national cancer center working on clinical data, evaluating external vendors and peer reviewing machine learning in healthcare papers. The talk gives an overview of best practices in prototyping machine learning models on data from the patient electronic health record (EHR). The topics addressed are:1. Introduction to the EHR2. Overview of machine learning applications to the EHR3. Cohort definition for survival problems4. Data cleaning5. Performance metricsExcerpts of papers from renowned institutions will be critically reviewed. The material is intended to be useful not only to machine learning for healthcare professionals, but to practitioners dealing with very unbalanced dataset in the temporal domain. For example, customer churn prediction can be modeled as survival problem.
Strategies for Considerations Requirement Sample Size in Different Clinical T...IJMREMJournal
-------------------------------------------------------ABSTRACT ---------------------------------------------------
Usually the main problem face any investigation it how to determent a sample size, however, some
considerations required in sample size to conduct the efficacy and make realistic well-researched before began
study. This study aimed to determine the maximum possible sample size at different phases of clinical trials and
attempt to achieve the best accuracy of the results. To achieve that the maximum sample size in different phases
we found that the maximum sample size of phase I was (75) relies on largest response rate 20% and the minimal
clinically important difference (MCID) 15%, and because the participants are healthy often that means 15%
enough to show positive results of the transition to the second phase. for the phase II clinical trials; the
maximum sample size was (388) depend on the error 5% and largest response rate 50% when the response rate
should not be less than 20% according to the design used in this phase. Depend on the endpoint and hazard
ratio in phase III clinical trials when the probability of survival of the treatment group equal to median of the
probability of survival 50% we found that the maximum sample size (4796). For the phase IV the maximum
sample size in different phases of clinical trials does not affect whatever the large of the population size and
remains constant as large as possible size.
In today’s world there is a wide availability of huge amount of data and thus there is a need for turning this
data into useful information which is referred to as knowledge. This demand for knowledge discovery
process has led to the development of many algorithms used to determine the association rules. One of the
major problems faced by these algorithms is generation of candidate sets. The FP-Tree algorithm is one of
the most preferred algorithms for association rule mining because it gives association rules without
generating candidate sets. But in the process of doing so, it generates many CP-trees which decreases its
efficiency. In this research paper, an improvised FP-tree algorithm with a modified header table, along
with a spare table and the MFI algorithm for association rule mining is proposed. This algorithm generates
frequent item sets without using candidate sets and CP-trees.
UNDERSTANDING CUSTOMERS' EVALUATIONS THROUGH MINING AIRLINE REVIEWSIJDKP
Data mining can be evaluated as a strategic tool to determine the customer profiles in order to learn
customer expectations and requirements. Airline customers have different characteristics and if passenger
reviews about their trip experiences are correctly analyzed, companies can increase customer satisfaction
by improving provided services. In this study, we investigate customer review data for in-flight services of
airline companies and draw customer models with respect to such data. In this sense, we apply two
approaches as feature-based and clustering-based modelling. In feature-based modelling, customers are
grouped into categories based on features such as cabin flown types, experienced airline companies. In
clustering-based modelling, customers are first clustered via k-means clustering and then modeled. We
apply multivariate regression analysis to model customer groups in both cases. During this, we try to
understand how customers evaluate the given services and what dominant characteristics of in-flight
services can be from the customer viewpoint.
With ever increasing number of documents on web and other repositories, the task of organizing and
categorizing these documents to the diverse need of the user by manual means is a complicated job, hence
a machine learning technique named clustering is very useful. Text documents are clustered by pair wise
similarity of documents with similarity measures like Cosine, Jaccard or Pearson. Best clustering results
are seen when overlapping of terms in documents is less, that is, when clusters are distinguishable. Hence
for this problem, to find document similarity we apply link and neighbor introduced in ROCK. Link
specifies number of shared neighbors of a pair of documents. Significantly similar documents are called as
neighbors. This work applies links and neighbors to Bisecting K-means clustering in identifying seed
documents in the dataset, as a heuristic measure in choosing a cluster to be partitioned and as a means to
find the number of partitions possible in the dataset. Our experiments on real-time datasets showed a
significant improvement in terms of accuracy with minimum time.
A predictive system for detection of bankruptcy using machine learning techni...IJDKP
Bankruptcy is a legal procedure that claims a person or organization as a debtor. It is essential to
ascertain the risk of bankruptcy at initial stages to prevent financial losses. In this perspective, different
soft computing techniques can be employed to ascertain bankruptcy. This study proposes a bankruptcy
prediction system to categorize the companies based on extent of risk. The prediction system acts as a
decision support tool for detection of bankruptcy
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
Arabic words stemming approach using arabic wordnetIJDKP
The big growth of the Arabic internet content in the last years has raised up the need for an effective
stemming techniques for Arabic language. Arabic stemming algorithms can be ranked, according to three
category, as root-based approach (ex. Khoja); stem-based approach (ex. Larkey); and statistical approach
(ex. N-Garm). However, no stemming of this language is perfect: The existing stemmers have a low
efficiency. In this paper, we introduce a new stemming technique for Arabic words that also solve the
problem of the plural form of irregular nouns in Arabic language, which called broken plural. The
proposed stem extractor provides very accurate results in comparisons with other algorithms.
Consequently the search effectiveness improved.
The aim of this paper is to use Text mining(TM) concepts in the field of Health care System. We no that now days decision making in health care involves number of opinions given by the group of medical experts for specific disease in the form of decisions which will be presented in medical database in the form of text. These decisions are then mined from database with the help of Data Mining techniques. Text document clustering is considered as tool for performing information based operations. For clustering normally K-means clustering technique is used. In this paper we use Bisecting K-means clustering technique and it is better compared to traditional K-means technique. The objective is to study the revealed
groupings of similar opinion-types associated with the likelihood of physicians and medical experts.
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where
data mining is the core of this process. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge.Data mining is the process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehouse and the whole process is divded into action plan to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we have reviewed Knowledge Discovery perspective in Data Mining and consolidated different areas of data
mining, its techniques and methods in it.
The recruitment of new personnel is one of the most essential business processes which affect the quality of
human capital within any company. It is highly essential for the companies to ensure the recruitment of
right talent to maintain a competitive edge over the others in the market. However IT companies often face
a problem while recruiting new people for their ongoing projects due to lack of a proper framework that
defines a criteria for the selection process. In this paper we aim to develop a framework that would allow
any project manager to take the right decision for selecting new talent by correlating performance
parameters with the other domain-specific attributes of the candidates. Also, another important motivation
behind this project is to check the validity of the selection procedure often followed by various big
companies in both public and private sectors which focus only on academic scores, GPA/grades of students
from colleges and other academic backgrounds. We test if such a decision will produce optimal results in
the industry or is there a need for change that offers a more holistic approach to recruitment of new talent
in the software companies. The scope of this work extends beyond the IT domain and a similar procedure
can be adopted to develop a recruitment framework in other fields as well. Data-mining techniques provide
useful information from the historical projects depending on which the hiring-manager can make decisions
for recruiting high-quality workforce. This study aims to bridge this hiatus by developing a data-mining
framework based on an ensemble-learning technique to refocus on the criteria for personnel selection. The
results from this research clearly demonstrated that there is a need to refocus on the selection-criteria for
quality objectives.
Social bookmarking system is a web-based resource sharing system that allows users to upload, share and
organize their resources i.e. bookmarks and publications. The system has shifted the paradigm of
bookmarking from an individual activity limited to desktop to a collective activity on the web. It also
facilitates user to annotate his resource with free form tags that leads to large communities of users to
collaboratively create accessible repositories of web resources. Tagging process has its own challenges
like ambiguity, redundancy or misspelled tags and sometimes user tends to avoid it as he has to describe
tag at his own. The resultant tag space is noisy or very sparse and dilutes the purpose of tagging. The
effective solution is Tag Recommendation System that automatically suggests appropriate set of tags to
user while annotating resource. In this paper, we propose a framework that does not depend on tagging
history of the resource or user and thereby capable of suggesting tags to the resources which are being
submitted to the system first time. We model tag recommendation task as multi-label text classification
problem and use Naive Bayes classifier as the base learner of the multilabel classifier. We experiment with
Boolean, bag-of-words and term frequency-inverse document frequency (TFIDF) representation of the
resources and fit appropriate distribution to the data based on the representation used. Impact of feature
selection on the effectiveness of the tag recommendation is also studied. Effectiveness of the proposed
framework is evaluated through precision, recall and f-measure metrics.
Column store databases approaches and optimization techniquesIJDKP
Column-Stores database stores data column-by-column. The need for Column-Stores database arose for
the efficient query processing in read-intensive relational databases. Also, for read-intensive relational
databases,extensive research has performed for efficient data storage and query processing. This paper
gives an overview of storage and performance optimization techniques used in Column-Stores.
On a business level, everyone wants to get hold of the business value and other organizational advantages that big data has to offer. Analytics has arisen as the primitive path to business value from big data. Hadoop is not just a storage platform for big data; it’s also a computational and processing platform for business analytics. Hadoop is, however, unsuccessful in fulfilling business requirements when it comes to live data streaming. The initial architecture of Apache Hadoop did not solve the problem of live stream data mining. In summary, the traditional approach of big data being co-relational to Hadoop is false; focus needs to be given on business value as well. Data Warehousing, Hadoop and stream processing complement each other very well. In this paper, we have tried reviewing a few frameworks and products
which use real time data streaming by providing modifications to Hadoop.
This paper describes the Collective Experience Engine (CEE), a system for indexing Experiential-
Knowledge about Web knowledge-sources (websites), and performing relative-experience calculations
between participants of the CEE. The CEE provides an in-browser interface to query the collective
experience of others participating in the CEE. This interface accepts a list of URLs, to which the CEE adds
additional information based on the Queryee's previously indexed Experiential-Knowledge. The core of the
CEE is its Experiential-Context Conversation (ECConversation) functionality, whereby an collection of a
person’s Web Experiential-Knowledge can be utilized to allow a real-world conversation-like exchange of
information to take place, including adjusting information-flow based on the Queryee's experiential
background and knowledge, and providing additional experientially-related knowledge integrated into the
answer from multiple selected 'experience donors'. A relative-experience calculation ensures that
information is retrieved only from relative-experts, to ensure sufficient additional information exists, but
that such information isn't too advanced for the Queryee to process. This paper gives an overview of the
CEE, and the underlying algorithms and data structures, and describes a system architecture and
implementation details.
Data performance characterization of frequent pattern mining algorithmsIJDKP
Big data quickly comes under the spotlight in recent years. As big data is supposed to handle extremely
huge amount of data, it is quite natural that the demand for the computational environment to accelerates,
and scales out big data applications increases. The important thing is, however, the behavior of big data
applications is not clearly defined yet. Among big data applications, this paper specifically focuses on stream mining applications. The behavior of stream mining applications varies according to the characteristics of the input data. The parameters for data characterization are, however, not clearly defined yet, and there is no study investigating explicit relationships between the input data, and streammining applications, either. Therefore, this paper picks up frequent pattern mining as one of the
representative stream mining applications, and interprets the relationships between the characteristics of the input data, and behaviors of signature algorithms for frequent pattern mining.
A simplified approach for quality management in data warehouseIJDKP
Data warehousing is continuously gaining importance as organizations are realizing the benefits of
decision oriented data bases. However, the stumbling block to this rapid development is data quality issues
at various stages of data warehousing. Quality can be defined as a measure of excellence or a state free
from defects. Users appreciate quality products and available literature suggests that many organization`s
have significant data quality problems that have substantial social and economic impacts. A metadata
based quality system is introduced to manage quality of data in data warehouse. The approach is used to
analyze the quality of data warehouse system by checking the expected value of quality parameters with
that of actual values. The proposed approach is supported with a metadata framework that can store
additional information to analyze the quality parameters, whenever required.
Data stratification is the process of partitioning the data into distinct and non-overlapping groups since the
study population consists of subpopulations that are of particular interest. In clinical data, once the data is
stratified into sub populations based on a significant stratifying factor, different risk factors can be
determined from each subpopulation. In this paper, the Fisher’s Exact Test is used to determine the
significant stratifying factors. The experiments are conducted on a simulated study and the Medical,
Epidemiological and Social Aspects of Aging (MESA) data constructed for prediction of urinary
incontinence. Results show that, smoking is the most significant stratifying factor of MESA data, showing
that the smokers and non-smokers indicates different risk factors towards urinary incontinence and should
be treated differently.
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
Similarity matching and join of time series data streams has gained a lot of relevance in today’s world that
has large streaming data. This process finds wide scale application in the areas of location tracking,
sensor networks, object positioning and monitoring to name a few. However, as the size of the data stream
increases, the cost involved to retain all the data in order to aid the process of similarity matching also
increases. We develop a novel framework to addresses the following objectives. Firstly, Dimension
reduction is performed in the preprocessing stage, where large stream data is segmented and reduced into
a compact representation such that it retains all the crucial information by a technique called Multi-level
Segment Means (MSM). This reduces the space complexity associated with the storage of large time-series
data streams. Secondly, it incorporates effective Similarity Matching technique to analyze if the new data
objects are symmetric to the existing data stream. And finally, the Pruning Technique that filters out the
pseudo data object pairs and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction Factor. We have
performed exhaustive experimental trials to show that the proposed framework is both efficient and
competent in comparison with earlier works.
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACHcscpconf
A huge amount of medical data is generated every day, which presents a challenge in analysing
these data. The obvious solution to this challenge is to reduce the amount of data without
information loss. Dimension reduction is considered the most popular approach for reducing
data size and also to reduce noise and redundancies in data. In this paper, we investigate the
effect of feature selection in improving the prediction of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a subset of features would mean choosing the
most important lab tests to perform. If the number of tests can be reduced by identifying the
most important tests, then we could also identify the redundant tests. By omitting the redundant
tests, observation time could be reduced and early treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be avoided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deterioration using the medical lab results. We
apply our technique on the publicly available MIMIC-II database and show the effectiveness of
the feature selection. We also provide a detailed analysis of the best features identified by our
approach.
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
Data mining is a non-trivial process of categorizing valid, novel, potentially useful and ultimately understandable patterns in data. In terms, it accurately state as the extraction of information from a huge database. Data mining is a vital role in several applications such as business organizations, educational institutions, government sectors, health care industry, scientific and engineering. . In the health care
industry, the data mining is predominantly used for disease prediction. Enormous data mining techniques are existing for predicting diseases namely classification, clustering, association rules, summarizations, regression and etc. The main objective of this research work is to predict kidney diseases using classification algorithms such as Naïve Bayes and Support Vector Machine. This research work mainly
focused on finding the best classification algorithm based on the classification accuracy and execution time performance factors. From the experimental results it is observed that the performance of the SVM is better than the Naive Bayes classifier algorithm.
IMPACT OF HEALTH INFORMATICS TECHNOLOGY ON THE IMPLEMENTATION OF A MODIFIED E...hiij
The Modified Early Warning System (MEWS) is based on a patient score that helps the medical team
monitor patients to identify a patient that may be experiencing a sudden decline in care. This study consists
of a detailed review of clinical data and patient outcomes to assess impact of technology and patient care.
There are a total of thirteen hospitals included in this review. These facilities have implemented vitals
capture and the MEWS scoring system.
IMPACT OF HEALTH INFORMATICS TECHNOLOGY ON THE IMPLEMENTATION OF A MODIFIED E...hiij
The Modified Early Warning System (MEWS) is based on a patient score that helps the medical team
monitor patients to identify a patient that may be experiencing a sudden decline in care. This study consists
of a detailed review of clinical data and patient outcomes to assess impact of technology and patient care.
There are a total of thirteen hospitals included in this review. These facilities have implemented vitals
capture and the MEWS scoring system.
IMPACT OF HEALTH INFORMATICS TECHNOLOGY ON THE IMPLEMENTATION OF A MODIFIED E...hiij
The Modified Early Warning System (MEWS) is based on a patient score that helps the medical team
monitor patients to identify a patient that may be experiencing a sudden decline in care. This study consists
of a detailed review of clinical data and patient outcomes to assess impact of technology and patient care.
There are a total of thirteen hospitals included in this review. These facilities have implemented vitals
capture and the MEWS scoring system.
IMPACT OF HEALTH INFORMATICS TECHNOLOGY ON THE IMPLEMENTATION OF A MODIFIED E...hiij
The Modified Early Warning System (MEWS) is based on a patient score that helps the medical team
monitor patients to identify a patient that may be experiencing a sudden decline in care. This study consists
of a detailed review of clinical data and patient outcomes to assess impact of technology and patient care.
There are a total of thirteen hospitals included in this review. These facilities have implemented vitals
capture and the MEWS scoring system.
826 Unertl et al., Describing and Modeling WorkflowResearch .docxevonnehoggarth79783
826 Unertl et al., Describing and Modeling Workflow
Research Paper �
Describing and Modeling Workflow and Information Flow in
Chronic Disease Care
KIM M. UNERTL, MS, MATTHEW B. WEINGER, MD, KEVIN B. JOHNSON, MD, MS,
NANCY M. LORENZI, PHD, MA, MLS
A b s t r a c t Objectives: The goal of the study was to develop an in-depth understanding of work practices,
workflow, and information flow in chronic disease care, to facilitate development of context-appropriate
informatics tools.
Design: The study was conducted over a 10-month period in three ambulatory clinics providing chronic disease
care. The authors iteratively collected data using direct observation and semi-structured interviews.
Measurements: The authors observed all aspects of care in three different chronic disease clinics for over 150
hours, including 157 patient-provider interactions. Observation focused on interactions among people, processes,
and technology. Observation data were analyzed through an open coding approach. The authors then developed
models of workflow and information flow using Hierarchical Task Analysis and Soft Systems Methodology. The
authors also conducted nine semi-structured interviews to confirm and refine the models.
Results: The study had three primary outcomes: models of workflow for each clinic, models of information flow
for each clinic, and an in-depth description of work practices and the role of health information technology (HIT)
in the clinics. The authors identified gaps between the existing HIT functionality and the needs of chronic disease
providers.
Conclusions: In response to the analysis of workflow and information flow, the authors developed ten guidelines
for design of HIT to support chronic disease care, including recommendations to pursue modular approaches to
design that would support disease-specific needs. The study demonstrates the importance of evaluating workflow
and information flow in HIT design and implementation.
� J Am Med Inform Assoc. 2009;16:826 – 836. DOI 10.1197/jamia.M3000.
Introduction
Health information technology (HIT) can enhance efficiency,
increase patient safety, and improve patient outcomes.1,2
However, features of HIT intended to improve patient care
can lead to rejection of HIT,3 or can produce unexpected
negative consequences or unsafe workarounds if poorly
aligned with workflow.4,5
More than 90 million people in the United States, or 30% of
the population, have chronic diseases.6 HIT can assist with
longitudinal management of chronic disease by, for exam-
Affiliations of the authors: Department of Biomedical Informatics
(KMU, MBW, KBJ, NML), Center for Perioperative Research in
Quality (KMU, MBW, KBJ), Institute of Medicine and Public Health,
VA Tennessee Valley Healthcare System and the Departments of
Anesthesiology and Medical Education (MBW), Department of
Pediatrics (KBJ), Vanderbilt University, Nashville, TN.
This research was supported by a National Library of Medicine
Training Grant, Number T15 .
College Writing II Synthesis Essay Assignment Summer Semester 2017.docxclarebernice
College Writing II Synthesis Essay Assignment Summer Semester 2017
Directions:
For this assignment you will be writing a synthesis essay. A synthesis is a combination of two or more summaries and sources. In a synthesis essay you will have three paragraphs, an introduction, a synthesis and a conclusion.
In the introduction you will give background information about your topic. You will also include a thesis statement at the end of the introduction paragraph. The thesis statement should describe the goal of your synthesis. (informative or argumentative)
The second paragraph is the synthesis. You will combine two summaries of two different articles on the same topic. You will follow all summary guidelines for these two paragraphs. The synthesis will most likely either argue or inform the reader about the topic.
The conclusion paragraph should summarize the points of your essay and restate the general ideas.
For this essay you will read two research articles on a similar topic to the previous critical review essay as you can use this research in your inquiry paper. You will summarize both articles in two paragraphs and combine the paragraphs for your synthesis. In the synthesis you must include the main ideas of the articles and the author, title, and general idea in the first sentences.
This essay will be three pages long and the first draft and peer review are due June 15. You must turn them in hardcopy in class so you can do a peer review.
Running head: THESIS DRAFT 1
THESIS DRAFT 3Thesis Draft
Katelyn B. Rhodes
D40375299
DeVry University
Point-of-Care Testing (PoCT) has dramatically taken over the field of clinical laboratory testing since it’s introduction approximately 45 years ago. The technologies utilized in PoCT have been refined to deliver accurate and expedient test results and will become even more sensitive and accurate in order to dominate the field of clinical laboratory testing. Furthermore, there will be a dramatic increase in the volume of clinical testing performed outside of the laboratory. New and emerging PoCT technologies utilize sophisticated molecular techniques such as polymerase chain reaction to aid in the treatment of major health problems worldwide, such as sexually transmitted infections (John & Price, 2014).
Historic Timeline
In the early-to-mid 1990’s, bench top analyzers entered the clinical laboratory scene. These analyzers were much smaller than the conventional analyzers being used, and utilized touch-screen PCs for ease of use. For this reason, they were able to be used closer to the patient’s bedside or outside of the laboratory environment. However, at this point in time, laboratory testing results were stored within the device and would have to then be sent to the main central laboratory for analysis.
Technology in the mid-to-late 1990’s permitted analyzers to be much smaller so that they may be easily carried to the patient’s location. Computers also became more ...
An AI-based Decision Platform built using unified data model, incorporating systems biology topics for unit analysis using semi-supervised learning models
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...IJARIIT
Two approaches to building models for prediction of the onset of Type diabetes mellitus in juvenile subjects were examined. A set of tests performed immediately before diagnosis was used to build classifiers to predict whether the subject would be diagnosed with juvenile diabetes. A modified training set consisting of differences between test results taken at different times was also used to build classifiers to predict whether a subject would be diagnosed with juvenile diabetes. Supervised were compared with decision trees and unsupervised of both types of classifiers. In this study, the system and the test most likely to confirm a diagnosis based on the pre-test probability computed from the patient's information including symptoms and the results of previous tests. If the patient's disease post-test probability is higher than the treatment threshold, a diagnostic decision will be made, and vice versa. Otherwise, the patient needs more tests to help make a decision. The system will then recommend the next optimal test and repeat the same process. In this thesis find out which approach is better on diabetes dataset in weka framework. Also use feature selection techniques which reduce the features and complexities of process
Are you interested in learning how to prevent hospital readmissions for your diabetic population? It is a popular belief that measuring blood glucose for your diabetic population is the most predictive variable in determining a hospital readmission for a diabetic. However, many providers of care simply do not perform the test on known diabetic patients. This study takes a look at an advanced analytic method that works within the current healthcare providers workflow to looks to identify the likelihood of a future 30-day unplanned readmission before hospital discharge.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Generating a custom Ruby SDK for your web service or Rails API using Smithy
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION PREDICTION
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
DOI : 10.5121/ijdkp.2015.5602 13
EXAMINING THE EFFECT OF FEATURE
SELECTION ON IMPROVING PATIENT
DETERIORATION PREDICTION
Noura AlNuaimi, Mohammad M Masud and Farhan Mohammed
College of Information Technology, United Arab Emirates University, Al-Ain, UAE
ABSTRACT
Large amount of heterogeneous medical data is generated every day in various healthcare organizations.
Those data could derive insights for improving monitoring and care delivery in the Intensive Care Unit.
Conversely, these data presents a challenge in reducing this amount of data without information loss.
Dimension reduction is considered the most popular approach for reducing data size and also to reduce
noise and redundancies in data. In this paper, we are investigate the effect of the average laboratory test
value and number of total laboratory in predicting patient deterioration in the Intensive Care Unit, where
we consider laboratory tests as features. Choosing a subset of features would mean choosing the most
important lab tests to perform. Thus, our approach uses state-of-the-art feature selection to identify the
most discriminative attributes, where we would have a better understanding of patient deterioration
problem. If the number of tests can be reduced by identifying the most important tests, then we could also
identify the redundant tests. By omitting the redundant tests, observation time could be reduced and early
treatment could be provided to avoid the risk. Additionally, unnecessary monetary cost would be avoided.
We apply our technique on the publicly available MIMIC-II database and show the effectiveness of the
feature selection. We also provide a detailed analysis of the best features identified by our approach.
KEYWORDS
Data mining; patient deterioration; ICU; lab test; feature selection; learning algorithm
1. INTRODUCTION
The last decade has seen huge advances in the amount of data that is generated and collected in
the modern intensive care units (ICUs), as well as the technologies used to analyse and
understand it. ICUs are specialist hospital wards, where they provide intensive care (treatment
and monitoring) for patients in seriously ill and their condition changes often. ICUs are
considered a critical environment where the decision needs to be carefully taken. These data
could be used with the help of intelligent systems, such as data analytics and decision support
systems, to determine which patients are at an increased risk of death. Making such decision
could allow healthcare professionals to take action at an early stages. For instance, patients in the
ICUs have a wide variety of medical laboratory tests on different body fluids (E.g. blood and
urine). The natures of medical lab tests and how often these tests are performed depend on why
the patient is in ICU and how stable the patient is.
Medical professionals may order laboratory tests to confirm a diagnosis or monitor patients’
health. However, deciding which test is likely to contribute information gain is a challenge.
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
14
Recent studies have demonstrated that frequent laboratory testing does not necessarily relate to
better outcomes [1].
Dimension reduction would be the first solution to eliminate duplicate, useless and irrelevant
features. This is typical alternative done while solving machine learning problems to select the
most discriminative attributes. In this paper, our goal is to propose an efficient mining technique
to reduce the observation time in ICUs by predicting patient deterioration in its early stages
through data analytics. Our proposed technique has several contributions. First, we use the lab
test results to predict patient deterioration. To the best of our knowledge, this is the first work that
primarily uses medical lab tests to predict patient deterioration. Lab test results have a crucial role
in medical decision making. Second, we identify most important medical lab tests using state-of-
the-art feature-selection techniques without using any informed domain knowledge. Finally, our
approach helps reduce redundant medical lab tests. Thus, healthcare professionals could focus on
the most important lab tests to assist them, which would save not only costs but also valuable
time in recovering the patient from a critical condition.
The paper is organised as follows. Section 2 presents the related work of predicting ICU death,
Section 3 gives background on data mining, Section 4 illustrates our proposed approach, Section
5 summarises the MIMIC II dataset, Section 6 illustrates the experiment’s work, Section 7
discusses the findings, and finally, the conclusion of this research is presented in Section 8.
2. LITERATURE REVIEW
This section reviews related works for predicting ICU death or the deterioration of ICU patients,
where ICUs workflow is filled with large quantities of data that need more analysis. Most of
efforts here are indented to identify redundancy or overlapping between medical laboratory tests.
ICUs like any other domain, need regular improvement at their processes and frequent requested
medical laboratory tests. In this section, we highlight some similarities and differences between
some of the related works and the proposed work.
In [2], the authors developed an integrated data-mining approach to give early deterioration
warnings for patients under real-time monitoring in the ICU and real-time data sensing (RDS).
They synthesised a large feature set that included first- and second-order time-series features,
detrended fluctuation analysis (DFA), spectral analysis, approximative entropy and cross-signal
features. Then, they systematically applied and evaluated a series of established data-mining
methods, including forward feature selection, linear and nonlinear classification algorithms, and
exploratory under sampling for class imbalance. In our work, we are using the same dataset.
However, we are using only the medical lab tests. Also, in our approach, we depend on feature
selection to reduce the size of the dataset.
A health-data search engine was developed in [3] that supported predictions based on the
summarised clusters patient types which claimed that it was better than predictions based on the
non-summarised original data. In our work, we use only the medical lab tests, and we attempt to
highlight the most important medical labs.
Liu et al. [4] investigated the minimum number of features that was required for a given learning
machine to achieve "satisfactory" performance. In their work, an ad hoc heuristic method based
on feature-ranking algorithms was used to perform the experiment on six datasets. They found
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
15
that the heuristic method is useful in finding the critical feature dimension for large datasets. In
our work, we also use the ranking to rank the most useful features. However, we attempt to
investigate the percentage of selected features that would be enough to have moderate model
accuracy.
Cismondi et al. [1] proposed reducing unnecessary lab testing in the ICU. Their approach
designed on predicting when a proposed future laboratory test may likely to contribute
information gain and thereby influence clinical management in patients with gastrointestinal
bleeding. At their experiment, there were 11 input variables in total. Ten of these were derived
from bedside monitor trends heart rate, oxygen saturation, respiratory rate, temperature, blood
pressure, and urine collections, as well as infusion products and transfusions. The final input
variable was a previous value from one of the eight laboratory tests being predicted: calcium,
PTT, hematocrit, fibrinogen, lactate, platelets, INR and hemoglobin. The outcome for each
laboratory test was a binary framework defining whether a laboratory test result contributed
information gain or not. Predictive modelling was applied to recognize unnecessary laboratory
tests in a real world ICU database extract comprising 746 patients with gastrointestinal bleeding.
This work is the closest one to our research; they have the same objective of reducing
unnecessary laboratory tests. However, they only focus on gastrointestinal bleeding. In our work,
we are targeting all cases in the ICUs. Besides that, they had constraints on the medical
laboratory tests, where they specify eight laboratory tests to be predicted.
Similarly Joon Lee and David M. Maslove [5] used information theory to identify the
unnecessary laboratory testing and bloodwork. They investigated the information content of 11
laboratory test results from 29,149 adult ICU admissions in the MIMIC II database. They used
Information theory to count the expected amount of redundant information both between
laboratory values from the same ICU day, and between consecutive ICU days. They found out
that most laboratory values showed a decreasing trend over time in the expected amount of novel
information they contained. Platelet, blood urea nitrogen (BUN), and creatinine measurements
exhibited the most amount of redundant information on days 2 and 3 compared to the previous
day. The creatinine-BUN and sodium-chloride pairs had the most redundancy. In our work, we
are not investigating any specific laboratory values, but we aim to identify the most critical
laboratory tests that need more attention. Also, in our case we are not depending on any domain
knowledge and without any intervention from medical experts.
Likewise the previous works, Hsieh et al. [6] worked on reducing unnecessary laboratory tests in
the ICUs. They proposed a computational-intelligence-based model to predict the survival rate of
critically ill patients who were admitted to an intensive care unit (ICU). At their research, the
prediction input variables were based on the first 24 hours admission physiological data of ICU
patients to forecast whether the final outcome was survival or not. Their prediction model was
based on a particle swarm optimization (PSO)-based Fuzzy Hyper-Rectangular Composite
Neural Network (PFHRCNN) that integrated three computational intelligence tools including
hyper-rectangular composite neural networks, fuzzy systems and PSO. In our work, we design
our experiment on state-of-the-art feature-selection techniques, where no constraint in the input
variables.
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
16
3. DATA MINING BACKGROUND
ICUs, like other healthcare sectors, is facing the need for analysing large amounts of data. Data
mining represents great potential benefits for the ICUs to enable systematically use data and
analytics to identify best practices that improve care and reduce costs. Clinical data mining is the
application of data mining techniques using clinical data. Data mining with clinical data has three
objectives: understanding the clinical data, assist healthcare professionals, and develop a data
analysis methodology suitable for medical data [7].
Data mining is the analysis step of knowledge discovery. It is about the ‘extraction of interesting
(non-trivial, implicit, previously unknown, and potentially useful) patterns or knowledge from
huge amount of data [10]’. When mining massive datasets, two of the most common, important
and immediate problems are sampling and feature selection. Appropriate sampling and feature
selection contribute to reducing the size of the dataset while obtaining satisfactory results in
model building [4].
3.1. Feature Selection
In machine learning, feature selection or attribute selection is the process of selecting a subset of
relevant features (variables, predictors) for use in model construction. Feature selection
techniques are used (a) to avoid overfitting and improve model performance, i.e. predict
performance in the case of supervised classification and better cluster detection in the case of
clustering, (b) to provide faster and more cost-effective models and (c) to gain deeper insight into
the underlying processes that generated the data. In the context of classification, feature selection
techniques can be organized into three categories, depending on how they perform the feature
selection search to build the classification model: filter methods, wrapper methods and embedded
methods, presented in table 1 [8] [9]:
1) Filter Methods are based on applying a statistical measure to assign a scoring to each feature.
Then, features are ranked by score and either selected or removed from the dataset. The
methods are often univariate and consider the feature independently or with regard to the
dependent variable.
2) Wrapper Methods are based on the selection of a set of features as a search problem, where
different combinations are prepared, evaluated and compared to other combinations. A
predictive model is used to evaluate a combination of features and assign a score based on
model accuracy.
3) Embedded Methods are based on learning which features most contribute to the accuracy of
the model while the model is being created.
Table 1: Feature selection categories
Model Search Advantages Disadvantages
Filter Fast
Scalable
Independent of the classifier
Ignores feature dependencies
Ignores interaction with the classifier
Wrapper Simple
Interacts with the classifier
Models feature decencies
Less computational
Risk for overfitting
More prone than randomized algorithms
Classifier-dependent selection
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
17
Embedded Interacts with the classifier
More computational
Models feature dependencies
Classifier-dependent selection
3.2. Data Classification Techniques
Classification is a pattern-recognition task that has applications in a broad range of fields. It
requires the construction of a model that approximates the relationship between input features
and output categories [10]. Some of the most popular techniques are discussed here in brief, all of
which are used in our work.
1) The Naïve Bayes classifier is based on applying Bayes’ theorem with strong independence
assumptions between the features. As one of its main features, the Naïve Bayes classifier is
easy to implement because it requires a small amount of training data in order to estimate the
parameters, and good results can be found in most cases. However, it has class conditional
independence, meaning it causes losses of accuracy and dependency [11].
2) Sequential minimal optimization (SMO) is an algorithm for efficiently solving the
optimization problem which arises during the training of support vector machines [12]. The
amount of memory required for SMO is linear in the training set size, which allows SMO to
handle very large training sets [13].
3) The ZeroR classifier simply predicts the majority category, which relies on the target and
ignores all predictors. Although there is no predictability power in ZeroR, it is useful for
determining a baseline performance as a benchmark for other classification methods [12].
4) A decision tree (J48) is a fast algorithm to train and generally gives good results. Its output is
human readable, therefore one can see if it makes sense. It has tree visualizers to aid
understanding. It is among the most used data mining algorithms. The decision tree partitions
the input space of a data set into mutually exclusive regions, each of which is assigned a
label, a value or an action to characterize its data points [12].
5) A RandomForest is a combination of tree predictors such that each tree depends on the values
of a random vector sampled independently and with the same distribution for all trees in the
forest [14].
4. PROPOSED APPROACH
In this section we introduce our approach for the data mining technique for predicting ICU
patient deterioration. Figure 1 shows the architecture of the proposed technique.
Figure 1: Architecture of the proposed approach.
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
18
The data are collected from the database of ICU patients (step 1). Then the data are integrated,
cleaned and relevant features are extracted (step 2). After that, feature selection or dimensionality
reduction techniques are applied to obtain the best set of features and reduce the data dimension
(step 3). Then the prediction model is learned using a machine learning approach (step 4). When
a new patient is admitted to the CPU, the patient’s data are collected incrementally (step 5). The
patient data are evaluated by the prediction model (step 6) to predict the possibility of
deterioration of the patient, and warnings are generated accordingly. Each of these steps is
summarized here, and more details of the dataset are given in Section 5.
1. ICU Patient Data: The details of the data and the collection process are discussed in
Section 5.
2. Preprocessing: At the preprocessing stage, we used two different datasets. These datasets
were generated from a Labevents table. The first dataset contained the average value of
applied medical tests, and the second contained the total number of times for each test
was applied.
3. Feature Selection / Dimension Reduction: attribute selection is the process of selecting a
subset of relevant features (variables, predictors) for use in model construction. The goal
here is to reduce the attributes so medical professional can identify the most important
medical lab tests used by reducing the redundant tests. In our work, we select filter
methods because they are moderately robust against the overfitting problem, as follows:
a. Attribute evaluator: InfoGrainAttributeEval
b. Search method: Ranker
c. Attribute selection mode: use full training set
4. Learning: In our experiment we use a classification technique and five of the most
popular classifier techniques: Naïve Bayes classifier, Support vector machine (SVM),
ZeroR classifier, decision tree (J48) and RandomForest. We use different types of
machine learning order to avoid random results.
5. Model: The developed model aims to predict ICU patient deterioration by mining lab test
results. Thus, observation time can be reduced in the ICUs and more actions can be taken
in the early stages.
6. Prediction: After each new test result, medication event, etc., the patient data are
preprocessed and features are extracted to supply to the prediction model. The model
predicts the probability of deterioration for the patient. This probability may change
when new data (e.g. more test results) are accumulated and applied to the model. When
the deterioration probability reaches a certain threshold specified by the healthcare
providers, a warning is generated. This would help the healthcare providers to take
proactive measures to save the patient from getting into a critical or fatal condition.
7. New patient data: When a new patient is admitted to the ICU, all his information is stored
in the database. Some of these are incremental, such as vital sign readings, lab test
results, medication events etc. The data of the patient again go through the preprocessing
and feature extraction phases before they can be applied to the model.
5. MIMIC II DATABASE
The MIMIC-II database is part of the Multiparameter Intelligent Monitoring in Intensive Care
project funded by the National Institute of Biomedical Imaging and Bioengineering at the
Laboratory of Computational Physiology at MIT, which was collected from 2001 to 2008 and
represents 26,870 adult hospital admissions. In our work, we use MIMIC-II version 2.6 because
is more stable than the newer version 3, which is still in the beta phase and needs further work of
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
19
cleaning, optimizing and testing. MIMIC-II consists of two major components: clinical data and
physiological waveforms.
The MIMIC dataset has three main features: (1) it is public; (2) it has a diverse and very large
population of ICU patients; and (3) it contains high temporal resolution data, including lab
results, electronic documentation, and bedside monitor trends and waveforms[15]. Several works
have used the MIMIC dataset, such as [16], [17] and [18].
In our work, we focus on the clinical data, the LABEVENTS and LABITEMS tables. The
Labevents table contains data of each patient’s ICU stay, as presented in table 2, and table 3
contains descriptions of the lab events. Considering medical lab choice was done because we
wanted to investigate the relationship between medical lab tests and patient deterioration so we
could identify which medical tests have a major effect on clinical decision making. For example,
the following information is about a patient who was staying at the ICU and was given a medical
test. The following information was recorded at that time:
• Subject_ID: 2
• Hadm_ID: 25967
• IcuStay_ID: 3
• ItemID: 50468
• Charttime: 6/15/2806 21:48
• Value: 0.1
• ValueNum: 0.1
• Flag: abnormal
• ValueUOM: K/uL
Table 2: Labevents Table Description
Name Type Null Comment
SUBJECT_ID NUMBER(7) N Foreign key, referring to a unique patient
identifier
HADM_ID NUMBER(7) Y Foreign key, referring to the hospital
admission ID of the patient
ICUSTAY_ID NUMBER(7) Y ICU stay ID
ITEMID NUMBER(7) N Foreign key, referring to an identifier for the
laboratory test name
CHARTTIME TIMESTAMP(6) WITH
TIME ZONE
N The date and time of the test
VALUE VARCHAR2(100) Y The result value of the laboratory test
VALUENUM NUMBER(38) Y The numeric representation of the laboratory
test if the result was numeric
FLAG VARCHAR2(10) Y Flag or annotation on the lab result to
compare the lab result with the previous
or next result
VALUEUOM VARCHAR2(10) Y The units of measurement for the lab result
value
8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
20
Table 3: Labitems Table
Name Type Null Comment
ITEMID NUMBER(7) N Table record unique identifier, the lab item ID
TEST_NAME VARCHAR2(50) N The name of the lab test performed
FLUID VARCHAR2(50) N The fluid on which the test was performed
CATEGORY VARCHAR2(50) N Item category
LOINC_CODE VARCHAR2(7) Y LOINC code for lab item
LOINC_DESCR
IPTION
VARCHAR2(100) Y LOINC description for lab item
5.1. Medical Lab Tests Average Dataset
The dataset was constructed by taking the average test result of each patient for each kind of test
and make it one attribute. Thus one patient would be represented as one instance having 700
attributes, one for each test. If a test was not done, then the value of that attribute would be 0.
For example, the first patient record in the dataset would look like this:
P_ID Avg1 Avg2 ..... Avg700 Dead/Alive
1 5.3 10 0 D
5.2. Total Number of Medical Lab Tests Dataset
The dataset was built by taking the total number of tests taken for each patient for each type of
test and make it one attribute. Then one patient would be represented as one instance having 700
attributes, one for each test. If a test was not done, then the value of that attribute would be 0.
For example, the dataset would look like this:
P_ID Count1 Count2 … Count700 Dead/Alive
1 5 0 1 D
6. EXPERIMENTS
In the experiment section we investigate the effect of feature selection in improving the
prediction of patient deterioration in the ICUs. We consider the lab tests as features. Thus,
choosing a subset of features would mean choosing the most important lab tests to perform. If
the number of tests can be reduced by identifying the most important tests, then we would also
identify the redundant tests.
6.1. Experiment 1: Building a Baseline of the Medical Lab Tests Average
1) Experiment Goal: The goal of this experiment was to investigate the effect of lab testing on
predicting patient deterioration. Usually, medical professionals compare the result of the lab
test with a reference range [19]. If the value is not within this range, the patient may face fatal
consequences. Thus, the patient is kept under observation and the test is repeated again
during a specific period. In our experiment, we investigated the average value of the same
repeated test and, more precisely, how the average value of lab results could assist medical
professionals in evaluating patient status.
Since we dealt with real cases, the only way to assess the quality and characteristics of a data
mining model was through the final status of the patient, i.e. whether the patient survived or
9. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
21
not. Thus, our evaluation criterion was how accurately our approach could predict whether
the patient died or not.
2) Building the Dataset: The dataset was constructed by taking the average test result of each
patient for each kind of test and make it one attribute. Thus one patient would be represented
as one instance having 700 attributes, one for each test. If a test was not done, then the value
of that attribute would be 0.
For example, the first patient record in the dataset would look like this:
P_ID Avg1 Avg2 ..... Avg700 Dead/Alive
1 5.3 10 0 D
3) Pre-processing: After building the dataset, some values could not be reported because they
were in text format. We used default values for these types of data. The total number of
attributes was 619 with 2900 instances.
4) Base learners: In our experiment we used five classification algorithms to construct the
model, namely NaiveBayes, SMO, ZeroR, J48 and RandomForest.
5) Evaluation: For a performance measurement, we did a 10-fold cross-validation of the dataset,
and the confusion matrix was obtained to estimate four measures: accuracy, sensitivity,
specificity and F-measure. As a result, RandomForest had the highest accuracy of 77.58%,
followed by SMO with 76.86%, J48 with 75.27%, ZeroR with 70.24% and NavieBayes with
42.96%, as shown in Table 4, Figure 2 and Figure 3. RandomForest and SMO have the same
F-measures. The reason for the best performance by RandomForest is that it works relatively
well when used with high-dimensional data with a redundant/noisy set of features [14].
Table 4: Experiment 1confusion matrix results.
Algorithm Learning Machine
Detailed Accuracy
Accuracy
Precision
Recall
F-Measure
Bayes NavieBayes 42.96% 0.672 0.430 0.404
Functions SMO 76.86 % 0.759 0.769 0.762
Rule ZeroR 70.24 % 0.493 0.702 0.580
Tree J48 75.27% 0.749 0.753 0.751
Tree RandomForest 77.58 % 0.765 0.776 0.762
Figure 2: Experiment 1 accuracy result.
10. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
22
Figure 3: Experiment 1 detailed accuracy result.
6.2. Experiment 2: Average Medical Lab Tests Discriminative Attributes
1) Experiment Goal: The goal of this experiment was to select the most discriminative attributes
that can almost describe the model with less number of attributes. In this experiment we were
working to get the most out of the average medical lab tests data, so we would have a better
understanding to patient deterioration problem.
2) Building the Dataset: In this experiment we used the same dataset that we used in experiment
1.
3) Pre-processing: In this stage, we used feature selection to select the most discriminative
attributes. For feature selection, we used weka.attributeSelection.CfsSubsetEval from WEKA
[20].
• Attribute Subset Evaluator: CfsSubsetEval
• Search Method: BestFirst.
• Evaluation mode: evaluate all training data
4) Base learner: Applying CfsSubsetEval reduced the attributes to 26 selected attributes. Now
the goal was to compare the reduced dataset with the baseline experiment result. We used the
same five classification algorithms to construct the model, namely NaiveBayes, SMO,
ZeroR, J48 and RandomForest. Please refer to Table 5.
Table 5: Experiment 2 confusion matrix result.
Algorithm Learning Machine
Detailed Accuracy
Accuracy
Precision
Recall
F-Measure
Bayes NavieBayes 56.24 % 0.774 0.562 0.564
Functions SMO 74.82 % 0.732 0.748 0.717
Rule ZeroR 70.24 % 0.493 0.702 0.580
Tree J48 76.75 % 0.765 0.768 0.766
Tree RandomForest 79.75 % 0.790 0.798 0.789
11. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
23
5) Evaluation: Comparing the accuracy results from this experiment and the first experiment
was reported in table. As a result, RandomForest accuracy had the most significant increase,
where it increased by 13 %. J48 and RandomForest increased had improved slightly.
However, SMO and ZeroR did not have any enhancement at their accuracy result. Please
refer to Table 6 and Figure 4.
Table 6: Accuracy comparison between Experiment 1 & Experiment 2.
Algorithm Learning
Machine
Accuracy of
the original
average
dataset
Accuracy of the
reduced average
dataset
Change
Bayes NavieBayes 42.96% 56.24 % 13.28%
Functions SMO 76.86 % 74.82 % -2.04%
Rule ZeroR 70.24 % 70.24 % 0.00%
Tree J48 75.27% 76.75 % 1.48%
Tree RandomForest 77.58 % 79.75 % 2.17%
Figure 4: Accuracy comparison between Experiment 1 & Experiment 2.
6.3. Experiment 3: Average Medical Lab Tests Feature Selection
1) Experiment Goal: The goal of this experiment was to study the relationship between
feature selection and classification accuracy. Feature selection is one of the
dimensionality reduction techniques for reducing the attribute space of a feature set.
More precisely, it determines how many features should be enough to give moderate
accuracy.
2) Building the Dataset: In this experiment we used the same dataset that we used in
experiment 1.
3) Pre-processing: In this experiment we built ten datasets depending on the number of
selected features. We start with the first dataset, which contained only 10% of the total
attributes. Then each time, we increased the total feature selections by 10%. For example,
12. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
24
dataset 1 contains 10% of the total attributes, dataset 2 contains 20% of the total
attributes, dataset 3 contains 30% of the total attributes and so on till dataset 10 contains
all 100% of the total attributes.
For feature selection, we use supervised.attribute. InfoGainAttributeEval from WEKA.
This filter is a wrapper for the Weka class that computes the information gain on a class
[20].
• Attribute Subset Evaluator: InfoGainAttributeEval
• Search Method: Ranker.
• Evaluation mode: evaluate all training data
4) Base learner: After generating all of the reduced datasets, we used the J48 algorithm to
construct a model.
5) Evaluation: For each reduced dataset, we applied 10-fold cross-validation for evaluating
the accuracy. Table V shows the results in numbers, and Figure 2 shows them as a chart.
The results indicate that taking only the most related 10% of the total features can give a
75.10% accurate result, which is comparable to the accuracy of the full feature set. This
indicates that not all of the features are required to get the highest accuracy. However,
there are some fluctuations, such as at 20%, the accuracy drops a little. We conclude that
selecting 50 to 80% of the attributes should give moderately satisfying accuracy.
Table 7: Experiment 3 feature selection result.
% of Features Selected # of Features Selected
J48 Detailed Accuracy
Accuracy
Number
ofleaves
Sizeof
theTree
10% 62 75.10% 200 399
20% 124 73.59% 201 401
30% 186 75.10% 185 369
40% 248 74.93% 179 357
50% 310 75.17% 189 377
60% 371 74.79% 187 373
70% 433 75.00% 189 377
80% 495 75.31% 184 367
90% 557 74.97% 183 365
100% 619 74.86% 184 367
Figure 5: Average datasets accuracy.
13. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
25
6.4. Experiment 4: Building a Baseline for the Total Number of Medical Lab Tests
1) Experiment Goal: The goal of this experiment was to investigate the effect of the total
number of lab tests conducted on predicting patient deterioration. Usually, medical
professionals keep requesting the same medical test over a brief period to compare the result
with a reference range [19]. If the value is not within the range, it means the patient may be in
danger, so the test is repeated again and again. Our goal was to predict at what total number a
medical professional should start immediate action and, more precisely, how the total number
of medical lab tests could assist the medical professional in evaluating the patient’s status.
2) Building the Dataset: The dataset was built by taking the total number of tests taken for each
patient for each type of test and make it one attribute. Then one patient would be represented
as one instance having 700 attributes, one for each test. If a test was not done, then the value
of that attribute would be 0.
For example, the dataset would look like this:
P_ID Count1 Count2 … Count700 Dead/Alive
1 5 0 1 D
3) Pre-processing: The dataset was randomized first, then two datasets were generated,
Count_Training_Validation_Dataset and Count_testing_Dataset. This step was repeated ten
times because we used randomization to distribute the instances between the two datasets.
4) Base learners: Five learning algorithms were used to build the model, namely NaiveBayes,
SMO, ZeroR, J48 and RandomForest.
Table 8: Experiment 4 confusion matrix Results.
Algorithm Learning Machine
Detailed Accuracy
Accuracy
Precision
Recall
F-Measure
Bayes NavieBayes 73.48% 0.716 0.735 0.711
Funtions SMO 74.85% 0.737 0.749 0.716
Rule ZeroR 69.72% 0.486 0.697 0.573
Tree J48 72.44% 0.722 0.724 0.723
Tree RandomForest 75.30% 0.739 0.753 0.736
Figure 6: Experiment 4 accuracy result.
14. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
26
Figure 6: Experiment 4 detailed accuracy result.
5) Evaluation: The training data were first used to build the model and then evaluated using a
percentage split via test data. For a performance measurement, the confusion matrix was
obtained to estimate four measures: accuracy, sensitivity, specificity and F-measure. Table 6
shows that SMO and RandomForest have almost equal levels of accuracy, around 75%. Even
after testing the model with the test datasets, SMO and RandomForest still have the highest
accuracy among the other techniques. The reason for this higher accuracy is that the amount
of memory required for SMO is linear in the training set size, which allows SMO to handle
very large training sets [13].
6.5. Experiment 5: Total Number of Medical Lab Tests Discriminative Attributes
1) Experiment Goal: The goal of this experiment was to select the most discriminative
attributes that can almost describe the model with less number of attributes. In this
experiment we were working to get the most out of the total number of medical lab tests
data, so we would have a better understanding to patient deterioration problem.
2) Building the Dataset: In this experiment we used the same dataset that we used in
experiment 4.
3) Pre-processing: In this stage, we used feature selection to select the most discriminative
attributes. For feature selection, we used weka.attributeSelection.CfsSubsetEval from
WEKA [20].
• Attribute Subset Evaluator: CfsSubsetEval
• Search Method: BestFirst.
• Evaluation mode: evaluate all training data
4) Base learner: Applying CfsSubsetEval reduced the attributes to 26 selected attributes.
Now the goal was to compare the reduced dataset with the baseline experiment result.
We used the same five classification algorithms to construct the model, namely
NaiveBayes, SMO, ZeroR, J48 and RandomForest.
5) Evaluation: Comparing the accuracy results from this experiment and the fourth
experiment was reported in Table 9 and Table 10. As a result, there was no enhancement
in general. Only J48 1.38%.
15. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
27
Table 9: Experiment 5 confusion matrix results.
Algorithm Learning Machine
Detailed Accuracy
Accuracy
Precision
Recall
F-Measure
Bayes NavieBayes 73.17 % 0.709 0.732 0.702
Functions SMO 73.68 % 0.726 0.737 0.684
Rule ZeroR 70.24 % 0.493 0.702 0.580
Tree J48 73.82 % 0.726 0.738 0.730
Tree RandomForest 74.65 % 0.731 0.747 0.733
Table 10: Accuracy comparison between Experiment 4 & Experiment 5.
Algorithm
Learning
Machine
Accuracy of
the original
total number
of tests
dataset
Accuracy of the
reduced total
number of tests
dataset
Change
Bayes NavieBayes 73.48% 73.17 % -0.31%
Functions SMO 74.85% 73.68 % -1.17%
Rule ZeroR 69.72% 70.24 % 0.52%
Tree J48 72.44% 73.82 % 1.38%
Tree RandomForest 75.30% 74.65 % -0.65%
Figure 8: Accuracy comparison between Experiment 4 & Experiment 5.
6.6. Experiment 6: Feature Selection for Total Number of Medical Lab Tests
1) Experiment Goal: The goal of this experiment was to study the relationship between feature
selection and classification accuracy. Feature selection is one of the dimensionality reduction
16. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
28
techniques for reducing the attribute space of a feature set. More precisely, it measures how
many features should be enough to give moderate accuracy.
2) Building the Dataset: In this experiment we used a count dataset.
3) Pre-processing: In the pre-processing step, we built ten datasets depending on the number of
selected features. The first dataset contained only 10% of the total attributes. Then we
increased the total feature selections by 10% with each new dataset. For example, dataset 1
contained 10% of the total attributes, dataset 2 contained 20% of the total attributes, dataset 3
contained 30% of the total attributes and so on till dataset 10 contained all 100% of the total
attributes.
4) For feature selection, we used supervised.attribute. InfoGainAttributeEval from WEKA. This
filter is a wrapper for the Weka class that computes the information gain on a class [20].
• Attribute Subset Evaluator: InfoGainAttributeEval
• Search Method: Ranker.
• Evaluation mode: evaluate on all training data
5) Base learner: After generating all reduced datasets, we used the J48 algorithm as a base
learner.
Table 11: Experiment 4 Results.
% of Features Selection
# of Features Selection
Detailed Accuracy
Accuracy
Number
ofleaves
Sizeofthe
Tree
10% 62 71.45% 237 473
20% 124 73.90% 250 499
30% 186 73.55% 247 493
40% 248 72.79% 252 503
50% 310 73.41% 252 503
60% 371 73.66% 254 507
70% 433 74.24% 254 507
80% 495 74.10% 254 507
90% 557 74.14% 265 529
100% 619 73.59% 259 517
Figure 7: Count Dataset accuracy.
17. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
29
6) Evaluation: Each feature-reduced dataset went through a 10-fold cross-validation for
evaluation. Figure 9 shows the accuracy of all count datasets. The detail values are also
reported in Table 9. From the results we observe that selecting 60 to 70% of the attributes
gives the highest accuracy. This also concludes that all features (i.e., lab tests) may not be
necessary to attain a highly accurate prediction of patient deterioration.
7. DISCUSSION
In the experiment we investigated the effect of feature selection in improving the prediction of
patient deterioration in the ICUs. We considered the lab tests as features. Thus, choosing a subset
of features would mean choosing the most important lab tests to perform. If the number of tests
could be reduced by identifying the most important tests, then we would also identify the
redundant tests. It should be noted that the feature selections were done without any domain
knowledge and without any intervention from medical experts. However, in the analysis we
would like to emphasize the merit of feature selection in choosing the best tests, which could be
further verified and confirmed by a medical expert.
First we compare the selected features selected from the two datasets, namely the average dataset
and the count dataset. Table 12 shows the 10 best features chosen by the two approaches and
highlights the common lab tests between the two approaches (i.e. using the average of tests and
count of tests). Table 13 shows more details about the common tests.
Table 11: Final Results.
Average Dataset Count Dataset
Best ranked 10 from the 10% of selected features
50177
50090
50060
50399
50386
50440
50408
50439
50112
50383
50148
50112
50140
50399
50177
50439
50090
50440
50079
50068
18. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
30
Table 13: Medical Lab Test Details.
19. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
31
LOINC is an abbreviation for logical observation identifiers names and codes. LOINC is clinical
terminology important for laboratory test orders and results [21]. ARUP Laboratories [22] is a
national clinical and anatomic pathology reference laboratory and a worldwide leader in
innovative laboratory research and development. We used their web page and others to clarify
more about the medical lab tests in table 10 as follows:
• UREAN (50177): This test is conducted using the patient’s blood. This test is
recommended to screen for kidney dysfunction in patients with known risk factors (e.g.
hypertension, diabetes, obesity, family history of kidney disease). The panel includes
albumin, calcium, carbon dioxide, creatinine, chloride, glucose, phosphorous, potassium,
sodium and BUN and a calculated anion gap value. Usually, the result is reported within
24 hours [22].
• CREAT (50090): This test is conducted using the patient’s blood. It is a screening test to
evaluate kidney function [22].
• INR(PT) (50399): This test is conducted using the patient’s blood by coagulation assay
[15].
• PTT (50440): This test is carried out to answer two main questions: does the patient have
antiphospholipid syndrome (APLS), and does the patient have von Willebrand disease? If
so, which type? It is carried out by mechanical clot detection [23].
• PT (50439): This test is conducted using the patient’s blood by coagulation assay [15].
• GLUCOSE (50112): This test is used to check glucose, which is a common medical
analytic measured in blood samples. Eating or fasting prior to taking a blood sample has
an effect on the result. Higher than usual glucose levels may be a sign of prediabetes or
diabetes mellitus [24].
• The result of the top 10 selected features from the average dataset allows us to build a
model using decision tree J48. This model would allow a medical professional to predict
the status of a patient in the ICU as follows:
20. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
32
For example, if the lab test (name: PTT, ID 50440, LOINC: 3173-2) result value is <=
20.757143, then the probability is very high (772.0/22.0~ 97.2%) that the patient is going to die
(class:1). This model has 78.6897% overall accuracy.
8. CONCLUSION AND FUTURE WORK
The increasing amount of medical laboratory data represents a significant information resource
that can provide a foundation for improved understanding of patients’ critical. Data mining
supports this goal by providing a set of techniques designed to discover similarities and
relationships between data elements in large data sets.
Reducing frequent laboratory testing and the potential care and financial implications are critical
issues in the intensive care units. In this paper, we presented our proposed approach to reduce the
observation time in the ICU by predicting patient deterioration in its early stages. In our work, we
presented six experiments to investigate the effect of the average laboratory test value and
number of total laboratory in predicting patient deterioration in the Intensive Care Unit. In our
work, we considered laboratory tests as features. Choosing a subset of features would mean
choosing the most important lab tests to perform.
For future work, the authors are planning to carry out more experiments using bigger data. Big
data analytics would bring potential benefits to support taking the right decision to enhance the
efficiency, accuracy and timeliness of clinical decision making in the ICU.
REFERENCES
[1] Federico Cismondi, Leo A. Celi, André S. Fialho, Susana M. Vieira, Shane R. Reti, Joao MC Sousa, and Stan N.
Finkelstein, “Reducing unnecessary lab testing in the ICU with artificial intelligence,” Int. J. Med. Inf., vol. 82,
no. 5, pp. 345–358, 2013.
[2] Yi Mao, Wenlin Chen, Yixin Chen, Chenyang Lu, Marin Kollef, and Thomas Bailey, “An integrated data mining
approach to real-time clinical monitoring and deterioration warning,” in Knowledge discovery and data mining,
2012, pp. 1140–1148.
[3] Masha Rouzbahman and Mark Chignell, “Predicting ICU Death with Summarized Data: The Emerging Health
Data Search Engine.,” KMD, 2014.
[4] Q. Liu, Sung, Andrew H, Ribeiro, Bernardete, and Suryakumar, Divya, “Mining the Big Data: The Critical
Feature Dimension Problem,” Adv. Appl. Inform. IIAIAAI 2014 IIAI 3rd Int. Conf. On, pp. 499–504, 2014.
[5] Joon Lee and David M. Maslove, “Using information theory to identify redundancy in common laboratory tests
in the intensive care unit,” BMC Med. Inform. Decis. Mak., vol. 15, no. 1, 2015.
[6] Yi-Zeng Hsieha, Mu-Chun Sua, Chen-Hsu Wangb, and Pa-Chun Wangc, “Prediction of survival of ICU patients
using computational intelligence,” Comput. Biol. Med., vol. 47, pp. 13–19, 2014.
[7] J. Iavindrasana, G. Cohen, A. Depeursinge, H. Müller, R. Meyer, and A. Geissbuhler, “Clinical data mining: a
review,” Yearb Med Inf., pp. 121–133, 2009.
[8] Yvan Saeys, Iñaki Inza, and Pedro Larrañaga, “A review of feature selection techniques in bioinformatics,”
bioinformatics, vol. 23, no. 19, pp. 2507–2517, 2007.
[9] “An Introduction to Feature Selection - Machine Learning Mastery.” [Online]. Available:
http://machinelearningmastery.com/an-introduction-to-feature-selection/. [Accessed: 06-Sep-2015].
21. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
33
[10] S. Bouktif et al, “Ant Colony Optimization Algorithm for Interpretable Bayesian Classifiers Combination:
Application to Medical Predictions,” PLoS ONE, vol. 9, no. 2, 2014.
[11] X. Wu et al., “Top 10 algorithms in data mining,” Knowl. Inf. Syst., vol. 14, no. 1, pp. 1–37, 2008.
[12] Chitra Nasa and Suman, “Evaluation of Different Classification Techniques for WEB Data,” Int. J. Comput.
Appl., vol. 52, no. 9, 2012.
[13] John C. Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines,”
Adv. Kernel Methods—support Vector Learn., vol. 3, 1999.
[14] Leo Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
[15] “MIMIC II Database.” [Online]. Available: https://mimic.physionet.org/database.html. [Accessed: 20-Aug-
2015].
[16] Lee J, Govindan S, Celi L, Khabbaz K, and Subramaniam B, “Customized prediction of short length of stay
following elective cardiac surgery in elderly patients using a genetic algorithm,” World J Cardiovasc Surg, vol.
3, no. 5, pp. 163–170, Sep. 2013.
[17] Lehman LH, Saeed M, Talmor D, Mark R, and Malhotra A, “Methods of blood pressure measurement in the
ICU,” Crit Care Med, vol. 41, no. 1, pp. 34–40, 2013.
[18] Lehman L, Long W, Saeed M, and Mark R, “Latent topic discovery of clinical concepts from hospital discharge
summaries of a heterogeneous patient cohort,” in Proceedings of the 36th International Conference of the IEEE
Engineering in Medicine and Biology Society, 2014.
[19] “Laboratory Test Reference Ranges | Calgary Laboratory Services.” [Online]. Available:
https://www.calgarylabservices.com/lab-services-guide/lab-reference-ranges/. [Accessed: 03-Sep-2015].
[20] “Feature Selection Package Documentation.” [Online]. Available:
http://featureselection.asu.edu/documentation/infogain.htm. [Accessed: 04-Sep-2015].
[21] “LOINC Codes - Mayo Medical Laboratories.” [Online]. Available:
http://www.mayomedicallaboratories.com/test-catalog/appendix/loinc-codes.html. [Accessed: 10-Sep-2015].
[22] “ARUP Laboratories: A National Reference Laboratory.” [Online]. Available: http://www.aruplab.com/.
[Accessed: 10-Sep-2015].
[23] “UCSF Departments of Pathology and Laboratory Medicine | Lab Manual | Laboratory Test Database | Activated
Partial Thromboplastin Time.”
[Online]. Available: http://labmed.ucsf.edu/labmanual/db/data/tests/802.html. [Accessed: 10-Sep-2015].
[24] “2345-7.” [Online]. Available: http://s.details.loinc.org/LOINC/2345-7.html?sections=Comprehensive.
[Accessed: 10-Sep-2015].
AUTHORS
Noura Al Nuaimi is pursuing a PhD in Information Technology with Dr Mohammad Mehedy Masud at United Arab
Emirates University (UAEU). She holds an MSc in Business Administration from Abu Dhabi University and a BSc in
Software Engineering from UAEU. Her research interests focus on data mining and knowledge discovery, cloud
computing, health information systems, search engines and natural language processing. She has published research
papers in IEEE Computer Society and IEEE Xplore.
Dr Mohammad Mehedy Masud is currently an Assistant Professor at the United Arab Emirates University (UAEU). He
joined the College of Information Technology at UAEU in spring 2012. He received his PhD from University of Texas
at Dallas (UTD) in December 2009. His research interests are in data mining, especially data stream mining and big
data mining. He has published more than 30 research papers in journals including IEEE Transactions on Knowledge
and Data Engineering (TKDE), Journal of Knowledge and Information Systems (KAIS), ACM Transactions on
Management Information Systems (ACM TMIS) and peer-reviewed conferences including IEEE International
Conference on Data Mining (ICDM), European Conference on Machine Learning (ECML/PKDD) and Pacific Asia
Conference on KDD. He is the principal inventor of a US patent application and lead author of the book “Data Mining
Tools for Malware Detection”. Dr Masud has served as a program committee member of several prestigious
conferences and has been serving as the official reviewer of several journals, including IEEE TKDE, IEEE TNNLS and
DMKD. During his service at the UAEU he has secured several internal and external grants as PI and co-PI.
Farhan Mohammed is a graduate from the College of Information Technology in United Arab Emirates University
specializing in Information Technology Management. He obtained his Bachelor’s in Management Information Systems
from United Arab Emirates University, Al Ain, UAE. He has worked under several professors and published four
conference papers and a journal paper for IEEE sponsored conferences. Currently he is working as a research assistant
in data mining in the health industry to develop models on health deterioration prediction. His area of interests lies in
smart cities, UAVs, data mining, and image and pattern recognition.