HQR Framework optimization for predicting patient treatment time in big data

IDL - International Digital Library Of
Technology & Research
Volume 1, Issue 6, June 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
IDL - International Digital Library 1 | P a g e Copyright@IDL-2017
HQR Framework optimization for predicting
patient treatment time in big data
1
PRATEEKSHA S KULKARNI
Co-Guide : Shanthi M B
1
Computer Science and Engineering, CMRIT Bengaluru
Email: 1
kulkarniprateeksha51@gmail.com
Contact Number: +91-8553926003
Abstract: Today most of the hospital face overcrowded with patients long queues for different tasks. Hospital management
face difficulty to handle these patients to provide optimal treatment time for each patients waiting in the long queue.
Unnecessary and annoying waits for long periods result in substantial human resource and time wastage and increase the
frustration endured by patients.It would be convenient and preferable if the patients could receive the most efficient
treatment plan and know the predicted waiting time updates in real time. Because of the large-scale, realistic data-set and the
requirement for real-time response, the PTTP algorithm and HQR system mandate efficiency and low-latency response.
Extensive experimentation and simulation results demonstrate the effectiveness and applicability of the proposed model to
recommend an effective and convenient treatment plan for patients to minimize their wait times in hospitals.
Keywords: Apache-spark, Hospital queuing recommendation, Big Data, Cloud Computing, Patient treatment time
prediction, Classification and regression tree.
1. INTRODUCTION
Today most of the hospitals are overcrowded with
long queue of the patients and have ineffective
management of patient queue. Managing the patients
queues and predicting their waiting time is
complicated and difficult job. As each patient who
comes for any checkup or any other task might
require to perform different tasks/operations, such as
checkup and Various tests, for example: blood test,
X-rays or a CT scan, payment history, or MR scan,
etc during treatment of the patients. We consider each
task of these tasks as treatment tasks or tasks to be
performed by individual patient. A patient in the
hospitals are usually required to undergo some
examinations, inspections or tests (test is referred to
tasks) per his condition. As the tasks to be performed
may be interdependent to be performed by each
patient. Some tasks are independent, whereas others
might have to depend on the other i.e. wait for the
completion of dependent tasks. Most of the people
who go for their checkup must wait for unpredictable
but long periods waiting in queues, waiting for their
turn on order to complete accomplish their checkup
and treatment task.
The main focus in this thesis is to help
patients to complete their treatment tasks in a
predictable and optimal time and making the
hospitals to schedule each treatment task queue to
avoid overcrowded and ineffective queues of the
patients who opt for a hospital for their treatment. We
use training data from different hospitals to develop a
patient treatment time model for the on an average
maximum/optimal time required for their treatment.
So to analyze the above context we have retrieve the
patient data which are gathered from different
hospitals by considering few important parameters,
which include patient’s treatment start time of a
particular task, its end time of the same task, patient
age, and the other detailed treatment data for each of
their tasks which ever is required for calculating the
optimal time.
We use a treatment model algorithm and an
hospital queuing system by considering the real-time
requirements for the treatment, huge data, and
complexity of the system, we use the big data
environment. The algorithm which is implemented
based on a treatment time model algorithm and thee

Random Forest (RF) method for each operative task
which is being performed during the patients visit,
and the waiting time of each task is being analyzed
and predicts the average required time for each
individual task. The hospital recommendation is
defined for an convenient treatment plan for each
patient and task. Patients can check their treating plan
and the predicted waiting time in real-time using a
mobile application developed. The Extensive
experimented results and the analyzed context shows
the time prediction algorithm and Random Forest
implementation system results in providing highly
effective and efficient performance.
2. DETAILS EXPERIMENTAL
2.1. Problem Statement
Most of the data in hospitals are unstructured,
massive and high dimensional. As every day hospitals
produces a huge amount of business data which
contains a great deal of information of individual
patient such as medicine data, doctor name, and all
the other detailed information.
The time consumption of the treatment tasks
in each department might not lie in the same range,
which can vary per the content of tasks and vary
circumstances, different period and different
conditions of patients. For example, in case of CT
scan, the time required for old man is generally
longer than that required for a young man. There are
the strict time requirements for hospital queuing
recommendation and management. The speed of
executing the HQR model and PTTP model so also
critical. The realistic patient data which are collected
from various hospitals are analyzed carefully and
rigorously based on important parameter such as
patient treatment start time, end time, patient age, and
detail treatment content for each different task. We
identify and calculate different waiting times for
different patients based on their operations performed
during treatment.
We use the RF algorithm to train patient
treatment the time consumption based on both patient
and time characteristics and then build PTTP model.
The overall logical structure of the project is divided
into processing modules and a conceptual data
structure is defined as Architectural data flow
diagram as shown in the Figure 2.1
Fig 2.1 Architecture of the HQR system
2.2. Data Pre-processing
In the preprocessing phase, hospital treatment data
from different treatment tasks are gathered. Everyday
substantial numbers of patients visit each hospital.
We collect the data from different hospitals for
analyzing the treatment time required for each task.
Let S be a set of patients in a hospital, and a patient
who has been registered and his information is
represented by si.
Assume that there are N patients in S:
S = {s1,s2, . . . . . . , sN},
where each patient si can have specific unchanged
parameters, e.g., name, ID, gender, age, and address
of each patient. Some of these parameters are used for
our analysis, whereas others are not preferably used.
Each patient can visit multiple treatment tasks per his
health condition. Let X|si be a set of treatment tasks
for patient si during a specific visit:
Table 1: Example of treatment records
X|si = {x1,x2, . . . . . , xK},

where each task record xi can consist of multiple
information consider Y , e.g., task name, task
location, department, start time, end time, doctor, and
attending staff:
Y|xi = {y1,y2, . . . . ,yM},
where yj is a feature variable of the record of
treatment task xi. As shown in Table 7.1 the
following records collected are used for calculating
the average.
2.3. Workflow of the data pre-processing is given
in the following steps:
a: Collecting data from different treatment tasks
Depending on statistics, the number of patients in a
medium-sized hospital lies can lie between the ranges
from 8,000 to 12,000 records per day, and the number
of remedial treatment data records can range between
from 120,000 to 200,000. These data are gathered
from different treatment tasks, including all the
information related to particular tasks.
b:Choose the same dimensions of the data
The hospital treatment data generated from different
treatment tasks have all the different fields with
different contents and formats which are of different
dimensions. In order to train the consumption model
for each task, we choose for the same features from
these same dimensional data, such as the patient
information (patient Id, gender, age, etc.), the
treatment task information (task name, department
name, doctor name, etc.), and the time information
(Start time and End time). Other feature or other
dimensions of the treatment data are ignored as they
are not much useful for the PTTP algorithm, such as
patient name, and address.
c: Calculate new feature variable of the data
We choose all these data to train the PTTP model,
various features of the data should be calculated, such
as the patient time consumption of each treatment
record, day of week for the treatment time, and the
time range of treatment time.
The workflow of the patient treatment and
wait model is illustrated below. Figure 2.2. Illustrates
the task flow between different patients. Consider
three patients as shown in the figure below (Patient1,
Patient2, and Patient3),
Fig 2.2: Flow diagram of the patient wait and
treatment model
and a set of treatment tasks required for each patient.
Some tasks can be dependent on a previous one as a
continued task, e.g., surgery or bandage cannot be
done before X-rays. Tasks {A; B; D} are required for
Patient1, whereas task D must wait for the
completion of B. Tasks {E; B; C; A} are required for
Patient2, and tasks {D; E; C} are required for
Patient3. Moreover, there are different numbers of
patients waiting in the queue of each task, for
example, 7 patients in the queue of task A and 5
patients in the queue of task B. In this paper, a Patient
Treatment Time Prediction (PTTP) model is trained
based on hospitals' historical data. The waiting time
of each treatment task is predicted by PTTP, which is
the sum of all patients' waiting times in the current
queue. Then, as per each patient's requested treatment
tasks, a Hospital Queuing-Recommendation (HQR)
system recommends an efficient and convenient
treatment plan with the least waiting time for the
patient.
The patient treatment time consumption of
each patient in the current waiting queue is estimated
by the trained PTTP model. The whole waiting time
of each task at the current time can be predicted, such
as {TA = 35(min); TB = 30(min); TC = 70(min); TD
= 24(min); TE = 87(min)}. Finally, the tasks of each
patient are sorted in an ascending order according to
the waiting time, except for the dependent tasks.

2.4 PTTP based on the improved random
forest model
2.3 PTTP based on RF model
In the preprocessing phase, the hospital treatment
data from different treatment tasks are gathered. As
the substantial numbers of patients do visit each
hospital every day. After calculating new feature
variables of treatment data, the error data need to be
removed. The treatment records with missing values
for the required data sample for critical features that
are removed as incomplete data, such as patient
gender, patient age, and task name. The treatment
records which have negative values induces for time
consumption those are removed as inconsistent data,
for instance, if the end time of the treatment operation
exist in the dataset and the training data is before the
start time, which can occur in cases when a start time
is recorded by a human and an end time is shown by a
machine. The types of data shown above are
considered as noisy data.
In figure 2.3 represents the PTTP model
based on the cart tree which takes the input as the
training data from the dataset and compute the
divisions as described in the below algorithm1 of the
tasks based on the age group and task. Finally, it
computes the average time for each task for a patient.
Algorithm 1: Process of the Random forest based
on PTTP Algorithm
Input:
STrain : the training datasets;
K : the number of CART trees in the RF model.
Output:
PTTPRF : The PTTP model based on the RF
algorithm.
for i = 1 to k do
create training
subset Strain ←sampling(STrain)
create OOB subset
SOOBi ← (STrain － Strain );
create an empty CART tree hi;
for each independent variable in do
calculate candidates split points
for each in do
calculate the best split point
arg min (∑ Left + ∑ Right)
end for
append node Node(ai,vp) to hi;
split data for left branch
RL(ai,vp) ← [x| ai < vp]
split data for right branch
RR(ai,vp) ← [x| ai > vp]
for each data R in { RL(ai,vp) , RR(ai,vp)} do
Calculate ɸ (vpL | ai) ← max ɸ(vp,ai)
if ɸ (vp(L|R) | ai) ≥ vp,ai then
append subnode
Node(ai,vp(L|R)) to Node(ai,vp)
multi-branch
split data to two forks RL and RR
else
collect cleaned data for leaf node
Dleaf
calculate mean value of leaf
node c
(1/k) ∑ Dleaf
3 RESULT AND DISCUSSION
The following snapshots and graphs define the results
or outputs that we will get after step by step execution
of each proposed service application when a new
patient opts for this service for checking the
availability for booking the appointment. And the

Fig 3.1: The test result of the above model
displaying the time for each patient for each task.
result is displayed on the patients output screen with
the optimal time which is calculated based on the
above procedures. The figure 3.1 shows the time
details which includes the start time and end time for
each task with the doctor’s name. In the doctor’s
login, the doctor can view the list of patients who
request for the opted doctor.
Fig 3.2: The appointment list in the doctor login
The doctor can login into this application and check
out the list of the patients who has requested for his
visit as shown in the figure 3.2.
Fig 3.3 Graph shows the avarage time vs Patient
Age
The figure 3.3 shows the graphs representing the
average time versus the age of the patient with which
we can analyze the minimum average time required
for each task for the patients requested tasks during
the request of the appointment.
CONCLUSIONS
The Hospital queuing treatment plan by using the
PTTP algorithm which is based on the big data has
been presented in this project.
1. A random forest technique is used to provide
the optimal result which is performed by the
patient time treatment prediction algorithm.
2. The proposed system is developed to
produce the optimal time for different tasks
with more efficient and convenient plan for
the patient’s.
REFERENCES
1. Eric. Hamrock, Mathew toerper, Sauleh
Siddiqui, Scott Levin “Real-time prediction
of inpatient length of stay for discharge
prioritization” - www.ieee.org Vol.
10.1093/jamia/ocv106 april-2015.
2. J G Dai pengyi Shi “A two time scale
approach to time varying queues in hospital
flow management”. Vol. 65.10.1287/opre.
2016 IEEET
3. Raul fidalgo-merino, Marlon nunez “Self
adaptive induction of regression trees”
10.1109/TPAMI.11.19 IEEE.
4. Kenli Li, Xiaoyong Tang, Bharadhwaj
Veeravali “Scheduling precedence
constrained stochastic tasks on
heterogeneous cluster systems” -
www.ieee.org Vol. 64 1-jan- 2016 IEEE.
5. Apache. (Jan. 2015). Mahout. [Online].
Available: http://mahout. Ashok Kumar
apache.org.
6. Y. Xu, K. Li, L. He, L. Zhang, and K. Li, “A
hybrid chemical reaction optimization
scheme for task scheduling on

heterogeneous computing systems” IEEE
Trans. Parallel Distribute. Syst., vol. 26, no.
12, pp. 3208_3222, Dec. 2015.
7. D. Dahiphale et al., ``An advanced
MapReduce: Cloud MapReduce,
enhancements and applications'' IEEE Trans.
8. Network. Service Manage., vol. 11, no. 1,
pp. 101_115, Mar. 2014.
9. Amiya kumari tripathy, rebeck Carvalho,
keshav pawaskar, “Mobile based healthcare
management using artificial intelligent”.
www.ieee.org Vol. 10.1109/ICTSD. 30-04-
2015


HQR Framework optimization for predicting patient treatment time in big data

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to HQR Framework optimization for predicting patient treatment time in big data

Similar to HQR Framework optimization for predicting patient treatment time in big data (20)

Recently uploaded

Recently uploaded (20)

HQR Framework optimization for predicting patient treatment time in big data