The motivation behind this research to help to predict blood donors help medical professionals predict the future demand & supply of blood and plan accordingly with Blood Banks & to entice voluntary blood donors to meet demand.
I used data mining and machine learning techniques in decision-making levels, to use Classification Models for the prediction that who will be a prospectus donor to the blood transfusion centers and blood banks in different periods of time
3. Outline
Introduction
Business Understanding
Data Preparation
Data Understanding
Model Building
Model Performance Evaluation
Conclusion and Future Scope
4. Introduction
In today's world, despite the enormous scientific advancements and the great developments in medical sciences, adequate
supply of healthy blood is one of the challenges and concerns of the medical community in the world. Blood donation has
an important and critical role to preserve the health and survival of human life. Preserving and supplying the volume of
blood required in blood banks of each region, and the diverse blood groups with the connections between them, with
assuming that the number of blood groups are rarer; makes the prediction and planning of blood donation more and more
complicated and important during the time. The use of data mining in hospitals and blood transfer centres databases helps
in the discovery of relations, so that they can have a future prediction based on the past information.
The blood demand is increasing day by day due to accidents, surgeries etc.
The motivation behind this research to help to predict blood donor help medical professionals predict the future demand &
supply of blood and plan accordingly with Blood Banks & to entice voluntary blood donors to meet demand.
5. Business Understanding
• In this project, we try to use data mining and machine learning techniques in decision making levels, to use
Classification Models for prediction that who will be a prospectus donor to the blood transfusion centres and
blood banks in different period time.
• The ability to identify regular blood donors will enable blood banks and voluntary organizations to plan
systematically for organizing blood donation camps in an effective manner.
• Sole purpose of this project is to find a prospective blood donor by which to increase the efficiency of the
business, save time and money of the organisation.
• In this regard, we use several classification algorithms in supervised learning for the prediction, including
decision tree algorithm is implemented to predict and results of accuracy are presented.
6. Data Understanding
This study adopted the donor database of Blood Transfusion Service Centre in Hsin-Chu City in Taiwan. The
centre passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated
about every three months.
The Blood Transfusion Service Centre drives to different universities and collects blood as part of a blood
drive.
To demonstrate the RFMTC marketing model (a modified version of RFM), we selected 748 donors at random
from the donor database.
7. Data
Description
This data set contains 5 Attributes
R (Recency - months since last donation),
F (Frequency - total number of
donation),
M (Monetary - total blood donated in
c.c.),
T (Time - months since first donation),
C a binary variable representing whether
he/she donated blood in March 2007 (1
stand for donating blood; 0 stands for not
donating blood).
8. Data
Exploration
All the attributes are of integer data
type.
But we had to change type of ‘C’ from
integer to binomial for Classification
Model.
The data set consists of 748 donors at
random from the donor database.
No missing values are observed
preparing data.
9. Recency
The column ranges from 0 to 74 where
375 blood donation incidents have been
observed in last 7 months.
12. Time
The time attribute which represent the
months since the donor’s first donation
ranges from minimum 2 month to 98
months maximum.
13. Whether
he/she donated
Blood
The label attribute whether he/she has
donated blood in March 2007 is being
represent in binary format where 570
donor has not donated blood in March
2007 out 748 donors.
14. Correlation
This data set has Monetary & Frequency
columns strongly correlated.
There is no correlation between any other
columns.
16. • Data Selection: Our primary goal of this data mining project is to predict if the donor is going to donate the blood
or not. Considering that, all the attributes are essential as there are not any quality and technical constraints.
Attributes such as Recency, Frequency and Monetary attributes indicates the donor’s interest, his/her blood donation
frequency and physical status of the donor which are extremely essential considering the their decision at the time.
• Clean data: The data that we are using is clean having zero missing values. We have checked it thoroughly and
found no repeated values either.
• Construct Dataset: Considering data mining goal from this data set we feel that this dataset doesn’t need any
additional feature/attribute to add/construct manually.
• Integrate Data: As the dataset is downloaded from one source and is in one file integration is not required.
Data Preparation
19. Conclusion &
Future Scope
• From results we can say that after applying decision tree,
we have got accuracy of 75.4 percent due to very less
correlation between the attributes of data set.
• Future work will be focusing on using more classification
algorithm of data mining.
• Performance of algorithm will be dependent on domain
and type of dataset.
• Hence focus will be on applying more machine learning
algorithm to increase the accuracy of result.
• Also to create a dashboard to manage the status of blood
donors that will help blood transfusion centers manage
the blood bank.