Published on

IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Ankit Bhardwaj, Arvind Sharma, V.K. Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1303-1309 Data Mining Techniques and Their Implementation in Blood Bank Sector –A Review Ankit Bhardwaj1, Arvind Sharma2, V.K. Shrivastava3 1 M.Tech Scholar, Department of CSE, HCTM, Kaithal, Haryana, India 2 Head, Department of Computer Science, DAV Kota, Rajasthan, India 3 Head, Department of I.T, HCTM, Kaithal, Haryana, IndiaABSTRACT and challenge for data mining. Data mining is a This paper focuses on the data mining process of the knowledge discovery in databases andand the current trends associated with it. It the goal is to find out the hidden and interestingpresents an overview of data mining system and information [1]. Various important steps areclarifies how data mining and knowledge involved in knowledge discovery in databasesdiscovery in databases are related both to each (KDD) which helps to convert raw data intoother and to related fields. Data Mining is a knowledge. Data mining is just a step in KDDtechnology used to describe knowledge discovery which is used to extract interesting patterns fromand to search for significant relationships such as data that are easy to perceive, interpret, andpatterns, association and changes among manipulate. Several major kinds of data miningvariables in databases. This enables users to methods, including generalization, characterization,search, collect and donate blood to the patients classification, clustering, association, evolution,who are waiting for the last drop of the blood pattern matching, data visualization, and meta-ruleand are nearby to death. We have also tried to guided mining will be reviewed. The explosiveidentify the research area in data mining where growth of databases makes the scalability of datafurther work can be continued. mining techniques increasingly important. Data mining algorithms have the ability to rapidly mineKeywords: Data mining, KDD, Association vast amount of data.Rules, Classification, Clustering, Prediction. The paper is organised as follows: InI. INTRODUCTION section 2 we use the data mining process and several In today’s computer age data storage has stages of the KDD. Section 3 presents the literaturebeen growing in size to unthinkable ranges that only survey. Section 4 discusses about the researchcomputerized methods applied to find information methodology to be proposed in our work. Finally,among these large repositories of data available to section 5 concludes the paper.organizations whether it was online or offline. Datamining was conceptualised in the 1990s as a means II. DATA MINING AND KDD PROCESSof addressing the problem of analyzing the vast Data mining is a detailed process of analyzing largerepositories of data that are available to mankind, amounts of data and picking out the relevantand being added to continuously. Data mining is information. It refers to extracting or miningnecessary to extract hidden useful information from knowledge from large amounts of data [2]. Datathe large datasets in a given application. This Mining is the fundamental stage inside the processusefulness relates to the user goal, in other words of extraction of useful and comprehensibleonly the user can determine whether the resulting knowledge, previously unknown, from largeknowledge answers his goal. The growing quality quantities of data stored in different formats, withdemand in the blood bank sector makes it necessary the objective of improving the decision ofto exploit the whole potential of stored data companies, organizations where the data can beefficiently, not only the clinical data and also to collected. However data mining and overall processimprove the behaviours of the blood donors. Data known as Knowledge Discovery from Databasesmining can contribute with important benefits to the (KDD), is usually an expensive process, especiallyblood bank sector, it can be a fundamental tool to in the stages of business objectives elicitation, dataanalyse the data gathered by blood banks through mining objectives elicitation, and data preparation.their information systems. In recent years, along This is especially the case each time data mining iswith development of medical informatics and applied to a blood bank. Data Mining can be definedinformation technology, blood bank information as the extraction or fetching of the relevantsystem grows rapidly. With the growth of the blood information i.e. Knowledge from the largebanks, enormous Blood Banks Information Systems repositories of data. That’s the reason it is also(BBIS) and databases are produced. It creates a need called as Knowledge Mining. However many 1303 | P a g e
  2. 2. Ankit Bhardwaj, Arvind Sharma, V.K. Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1303-1309synonyms are linked with Data Mining viz. Many people treat the data mining as a synonym forKnowledge Mining from Data, Knowledge another popularly used term, Knowledge Discoveryextraction, Data/ pattern analysis, Data archaeology from Data. The following figure (fig.1), shows theand Data dredging. Data Mining is also popularly data mining as simply an essential step in theknown as Knowledge Discovery in data bases process of KDD i.e. Knowledge Discovery from(KDD), refers to the nontrivial extraction of Data[3].implicit, previously unknown and potentially usefulinformation from data in Databases. Selection Preprocessing Transformation Transformed data Data Mining Preprocessed data INITIAL TARGET TARGET DATA DATA DATA PATTERNS KNOWLEDGE Evaluation Fig.1: Data mining is the core of KDD Process The KDD process includes selecting the 2.1 STEPS OF KDD PROCESSdata needed for data mining process & may be The knowledge discovery in databasesobtained from many different & heterogeneous data (KDD) process comprises of a few steps leadingsources. Preprocessing includes finding incorrect or from raw data collections to some form of newmissing data. There may be many different activities knowledge. The iterative process contains of theperformed at this time. Erroneous data may be following steps:corrected or removed, whereas missing data must besupplied. Preprocessing also include: removal of 1. Data cleaningnoise or outliers, collecting necessary information to It also known as data cleansing, it is amodel or account for noise, accounting for time fundamental step in which noise data and irrelevantsequence information and known changes. data are removed from the collection.Transformation is converting the data into acommon format for processing. Some data may be 2. Data integrationencoded or transformed into more usable format. In this step, multiple data sources, oftenData reduction, dimensionality reduction (e.g. heterogeneous, may be combined in a commonfeature selection i.e. attribute subset selection, source.heuristic method etc) & data transformation method 3. Data selection(e.g. sampling, aggregation, generalization etc) may At this step, the data relevant to thebe used to reduce the number of possible data values analysis is decided on and retrieved from the databeing considered. Data Mining is the task being collection.performed, to generate the desired result.Interpretation/Evaluation is how the data mining 4. Data transformationresults are presented to the users which are It is also known as data consolidation, it isextremely important because the usefulness of the a stage in which the selected data is transformed intoresult is dependent on it. Various visualization & forms appropriate for the mining procedure.GUI strategies are used at this step. Different kindsof knowledge requires different kinds of 5. Data miningrepresentation e.g. classification, clustering, It is the core step of KDD process in whichassociation rule etc.[4] clever techniques are applied to extract patterns potentially useful. 1304 | P a g e
  3. 3. Ankit Bhardwaj, Arvind Sharma, V.K. Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1303-1309 6. Pattern evaluation 4. Discovering Patterns and Rules: It concern with In this step, strictly interesting patterns pattern detection, the aim is spotting fraudulent representing knowledge are identified based on behavior by detecting regions of the space defining given measures. the different types of transactions where the data points significantly different from the rest. 7. Knowledge representation This is the final step in which the 5. Retrieval by Content: It is finding pattern discovered knowledge is visually represented to the similar to the pattern of interest in the data set. This user. This is a very essential step that uses task is most commonly used for text and image data visualization techniques to help users understand sets. and interpret the data mining results. III. LITERATURE SURVEY 2.2 DATA MINING MODELS In this section, lot of research works have been The data mining recorded from past few years. They are presented models and tasks are shown in figure 2 given below: here: [8] In June 2009, work is carried out on management Data Mining information system that helps managers in providing Models decision making in any organization. This MIS is basically all about the process of collecting, processing, storing and transmitting the relevant information that is just like the life blood of the organization that uses a data mining approach. Predictive Model Descriptive Model [7] In June 2009, the research work entitled on web (Supervised) (Unsupervised) based information system for blood donation, all the information regarding blood donation are available on world wide web i.e. online systems that communicate/interconnect all the blood donorClassification Clustering societies in a country using LAN Technology. Association Rule [6] In July 2009, the research work entitled onRegression Segmentation, Reconstruction and Analysis of BloodPrediction Sequence discovery thrombus formation in 3D-2 photons microscopyTime Series Analysis Summarization images. In this work, method and platform are applied to differentiate between the thromby formed Fig.2: Data Mining Models and Tasks in wild type and low FV11 mice. The high resolution quantitative structural analysis using some algorithms The predictive model makes prediction that provide a new matrix, that is likely to be critical about unknown data values by using the known to categorize and understanding bio-medically values. The descriptive model identifies the patterns relevant characteristics of thromby. Also, in this or relationships in data and explores the properties work, composition of different components on the of the data examined. clot surface and number of voxels in each clot component has been computed. 2.3 DATA MINING TASKS [5] In December 2009, the work proposed on an There are different types of data mining application to find spatial distribution of blood tasks depending on the use of data mining results. donors from the blood bank information system, These data mining tasks are categorised as follows: blood donation and transfusion services are carried out and help the patients to access the availability of 1. Exploratory Data Analysis: It is simply blood from anywhere. exploring the data without any clear ideas of what [9] In the year 2009, the research was based we are looking for. These techniques are interactive geographical variation in correlates of blood donor and visual. turn out rates-an investigation of Canadian metropolitan areas. In this work, Canadian blood 2. Descriptive Modeling: It describes all the data, it services i.e. an organization that aims the collecting includes models for overall probability distribution and distributing the blood supply across the country. of the data, partitioning of the p-dimensional space Accessibility variables are introduced and calculated into groups and models describing the relationships using a 2-step floating catchment area in order to between the variables. analyse the accessibility of services. The regression technique is also used. 3. Predictive Modeling: This model permits the [10] In the year 2010, the research is accomplished on value of one variable to be predicted from the application of CART algorithm in blood donor’s known values of other variables. classification. In this work, one of the popular data 1305 | P a g e
  4. 4. Ankit Bhardwaj, Arvind Sharma, V.K. Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1303-1309mining technique i.e. Classification is used and through [14] In year 2011, the work was proposed on anthe use of CART application, a model is created that intelligent system for improving performance ofdetermines the donor’s behavior. blood donation. In this work, many important[11] In the year 2010, the data mining tool is used to techniques i.e. Clustering, K-means classification isextract information from PPI (Protein-Protein adopted that improves the performance of bloodInteraction) systems and developed a PPI search donation services.system known as PP Look that is an effective tool [19] In January 2012, the data mining techniques are4-tier information extraction systems based on a full applied on medical databases and clinical databasessentence parsing approach. In this paper, researchers which are used to store huge amount of informationintroduced a useful tool that is PP Look which uses regarding patient’s diagnosis, lab test results,an improved keyword, dictionary pattern matching patient’s treatments etc. which is a way of mining thealgorithm to extract protein-protein interaction medical information for doctors and medicalinformation from biomedical literature. Some visual researchers. In all developed countries, it is foundmethods were adopted to conclude PPI in the form of that the routine health tests are very common among3D stereoscopic displays. all adults. It is also determined that precautionary[12] In December 2010, the work is proposed on measures are less expensive rather than thebinary classifiers for health care databases i.e. a treatments and also facilitates a better chance forcomparative study of data mining classification patient’s treatment at earlier stages. In this researchalgorithms in the diagnosis of breast cancer. This work, five medical databases are used andwork helps in uncovering the valuable knowledge experimental results are computed using data mininghidden behind them and also helping the decision software tool.makers to improve the health care services. In this [20] In February 2012, the work is carried outwork, the presented experiment provides medical through the Health Care Applications to diagnosedoctors and health care planers a tool to help them different diseases using Red Blood Cells counting. Inquickly make sense of vast clinical databases. this work, the researchers presented the automatic[15] In February 2011, the research paper presented process of RBC count from an image and some of theon classifying blood donors using data mining data mining techniques like Segmentation,techniques, the work is performed on blood group Equalization and K-means clustering are used fordonor data sets using classification technique. preprocessing of the images. Some diseases which[13] In August 2011, the research work carried out on are related to sickness of RBC count also predicted.interactive knowledge discovery in blood transfusion [21] In this work, the main purpose of the system isdata set, the work is done through conducting data to guide diabetic patients during the disease. Diabeticmining experiments that help the health professionals patients could benefit from the diabetes expert systemin better management of blood bank facility. by entering their daily glucoses rate and insulin[16] In year 2011, the work presented on rule dosages; producing a graph from insulin history;extraction for blood donors with fuzzy sequential consulting their insulin dosage for next day. It’s alsopattern mining, fuzzy sequential pattern algorithm is tried to determine an estimation method to predictused to extract rules from blood transfusion service glucose rate in blood which indicates diabetes risk. Incenter data set that predicts the behaviour of donor in this work, the WEKA data mining tool is used tothe future. classify the data and the data is evaluated using[17] In August 2011, the research work proposed on 10-fold cross validation and the results are compared.a comparison of blood donor classification datamining models, which uses decision tree to examine IV. RESEARCH METHODOLOGYthe blood donor’s classification. In this work, 4.1 Data Mining Techniques in Blood Bankcomparison between extended RVD based model and SectorDB2K7 procedures are carried out and it wasdiscovered that RVD classification is better than In our proposed methodology, we suppose a bloodDB2K7 in aspects of recalling and precision bank system has the following objectives:capability.  Increase blood donor’s rate and their attitude.[18] In the September 2011, the research work  Attract more blood donors to donate blood.carried out on real time blood donor management  Assist blood bank professionals in makingusing dash boards based on data mining models, the policy on the acquisition of blood donors anddata mining techniques are used to examine the blood new blood banks.donor classification and promote it to development of There are several major data mining techniques hadreal time blood donor management using dash boards been developed and used in data mining researchwith blood profile or RVD profile and geo-location projects recently including classification, clustering,data. association, prediction and sequential patterns. These techniques and methods in data mining need brief mention to have better understanding. 1306 | P a g e
  5. 5. Ankit Bhardwaj, Arvind Sharma, V.K. Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1303-13094.1.1 Classification specific target variable. Association rule has theThe estimation and prediction may be viewed as several algorithms like: Apriori, CDA, DDA,types of classification. The problem usually is interestingness measure etc. Association rules areevaluating the training data set and then we apply to if/then statements that help uncover relationshipsthe future model. between seemingly unrelated data in a relationalThe following table1 shows different classification database or other information repository. Analgorithms: example of an association rule can be ‘if a customer buys a dozen eggs, he is 80% likely to also purchase Type Name of algorithm milk.’ An association rule has two parts, an Statistical Regression antecedent (if) and a consequent (then). An Bayesian antecedent is an item found in the data. A Distance Simple distance consequent is an item that is found in combination K-nearest neighbors with the antecedent. Decision ID3 The association rules are created by analyzing data Tree C4.5 for frequent if/then patterns and using the CART criteria support and confidence to identify the most SPRINT important relationships. The support is an indication Neural Propagation of how frequently the data items appear in the Network NN Supervised learning database. The confidence indicates the number of Radial base function times the if/then statements have been found to be network true. Rule based Genetic rules from DT Types of Association Rule Techniques are as Genetic rules from NN follows: Genetic rules without  Multilevel Association Rule DT and NN  Multidimensional Association Rule Table 1: Classification Algorithms  Quantitative Association Rule4.1.2 Clustering 4.1.4 PredictionClustering technique is grouping of data, which is The prediction as it name implied is one of the datanot predefined. By using clustering technique we mining techniques that discovers relationshipcan identify dense and sparse regions in object between independent variables and relationshipspace. The following table2 shows different between dependent and independent variables. Forclustering algorithms: instance, prediction analysis technique can be used Type Name of algorithm in blood donors to predict the behaviour for the Similarity and Similarity & distance future if we consider donor is an independent distance measure variable, blood could be a dependent variable. Then measure based on the historical data, we can draw a fitted Outlier Outlier 3333 regression curve that is used for donor’s behaviour Hierarchical Agglomerative, Divisive prediction. Regression technique can be adapted for Partitional Minimum spanning tree predication. Regression analysis can be used to Squared matrix model the relationship between one or more K-means independent variables and dependent variables. In Nearest Neighbor data mining independent variables are attributes PAM already known and response variables are what we Bond Energy want to predict. Unfortunately, many real-world Clustering with Neural problems are not simply prediction. Network Types of Regression Techniques are as follows: Clustering large BIRCH  Linear Regression database DB Scan  Multivariate Linear Regression CURE  Nonlinear Regression Categorical ROCK  Multivariate Nonlinear Regression Table 2: Clustering Algorithms 4.1.5 Sequential Patterns4.1.3 Association Rule Sequential patterns analysis in one of data miningThe central task of association rule mining is to find technique that seeks to discover similar patterns insets of binary variables that co-occur together data transaction over a business period. The uncoverfrequently in a transaction database, while the goal patterns are used for further business analysis toof feature selection problem is to identify groups of recognize relationships among data.that are strongly correlated with each other with a 1307 | P a g e
  6. 6. Ankit Bhardwaj, Arvind Sharma, V.K. Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1303-1309 V. CONCLUSIONThis paper presents an overview of data mining and [11] Dr. Varun Kumar, Luxmiverma; ‘Binaryits techniques which have been used to extract classifiers for Health Care databases - Ainteresting patterns and to develop significant comparative study of data mining classificationrelationships among variables stored in a huge algorithms in the diagnosis of Breast cancer’,dataset. Data mining is needed in many fields to IJCST, Vol. I, Issue 2, December 2010.extract the useful information from the large amount [12] Vikram Singh and Sapna Nagpal; ‘Interactiveof data. Large amount of data is maintained in every Knowledge discovery in Blood Transfusion Datafield to keep different records such as medical data, Set’; VSRD International Journal of Computerscientific data, educational data, demographic data, Science and Information Technology; Vol. I, Issue 8,financial data, marketing data etc. Therefore, 2011.different ways have been found to automatically [13] Wen-ChanLee and Bor-Wen Cheng; ‘Ananalyze the data, to summarize it, to discover and Intelligent system for improving performance ofcharacterize trends in it and to automatically flag blood donation’, Journal of Quality, Vol. 18, Issueanomalies. The several data mining techniques are No. II, 2011.introduced by the different researchers. These [14] P.Ramachandran et al. ‘Classifying bloodtechniques are used to do classification, to do donors using data mining techniques’; IJCST, Vol.clustering, to find interesting patterns. In our future I, Issue1, February 2011.work, the data mining techniques will be [15] F. Zabihi et al. ‘Rule Extraction for Bloodimplemented on blood donor’s data set for predicting donators with fuzzy sequential pattern mining’; Thethe blood donor’s behaviour and attitude, which have Journal of mathematics and Computer Science; Vol.been collected from the blood bank center. II, Issue No. I, 2011. [16] Shyam Sundaram and Santhanam T: ‘A comparison of Blood donor classification dataREFERENCES mining models’, Journal of Theoretical and Applied [1] Badgett RG: How to search for and evaluate Information Technology, Vol.30, No.2, 31 August medical evidence. Seminars in Medical Practice 2011. 1999, 2:8-14, 28. [17] Shyam Sundaram and Santhanam T; ‘Real [2] Jaiwei Han and Micheline Kamber, ‘Data Time Blood donor management using Dash boards Mining Concepts and Techniques’, Second Edition, based on Data mining models’, International Morgan Kaufmann Publishers. Journal of Computing issues, Vol.8, Issue5, No.2, [3] ‘Data Mining Introductory and Advanced September 2011. Topics’ by Margaret H. Dunham. [18] Prof. Dr. P.K. Srimani et al. ‘Outlier data [4] Sunita B Aher, LOBO L.M.R.J., International mining in medical databases by using statistical Conference on Emerging Technology Trends methods’, Vol.4, No.1, January 2012. (ICETT)-2011 Proceedings published by [19] Alaa hamouda et al; ‘Automated Red Blood International Journal of Computer Applications Cells counting’, International Journal of Computing (IJCA). Science, Vol.1, No.2, February 2012. [5] B.G. Premasudha et al., ‘An Application to find [20] Ivana D. Radojevic et al. ‘Total coliforms and spatial distribution of Blood Donors from Blood data mining as a tool in water quality monitoring’, Bank Information’,Vol. II, Issue No.2, July- African Journal of Microbiology Research, Vol. December 2009. 6(10), 16 March 2012. [6] Jian Mu et al., ‘Segmentation, Reconstruction, [21] P.Yasodha, M. Kannan; Analysis of a and analysis of blood thrombus formation in 3 D- Population of Diabetic Patients Databases in Weka 2photon microscopy images’, EURASIP Journal on Tool; International Journal of Scientific & Advances in Signal Processing, 10 July 2009. Engineering Research Volume 2, Issue 5, May 2011. [7] Abdur Rashid Khan et and al. ‘Web based . Information System for Blood Donation’, Author’s Profile International Jounal of digital content Technology and its applications,Vol. 3, Issue No.2, July 2009. [8] G. Satyanaryana Reddy et al; ‘Management Information System to help Managers for providing decision making in any organization’. [9] T. Santhanm and Shyam Sunderam: ‘Application of Cart Algorithm in Blood donor’s classification’, Journal of computer Science Vol. 6, Issue 5. Ankit Bhardwaj received the B.Tech. Degree in [10] Zhangetal. ‘PPLook: an automated data Information Technology from Maharishi Dayanand mining tool for protein–protein interaction’, BMC University, Rohtak, Haryana, India in 2009 and pursuing Bio-informatics- 2010. M.Tech (Comp.Sc. & Engg) from Haryana College of 1308 | P a g e
  7. 7. Ankit Bhardwaj, Arvind Sharma, V.K. Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1303-1309Technology and Management, Kaithal, KurukshetraUniversity, Kurukshetra. His research interest lies in thearea of Data Mining.Arvind Sharma received the M.Sc. Degree in ComputerScience from Maharishi Dayanand University, Rohtak,Haryana, India in 2003, M.C.A. Degree from JRNRVUniversity Udaipur, Rajasthan, India, M.Phil. Degree inComputer Science with specialization in Web DataMining Applications from the Alagappa University,Karaikudi, Tamil Nadu, India and pursuing Ph.D.Computer Science from Jaipur National University,Jaipur, Rajasthan, India. He has published many technicalpapers in International and National reputed journals andconferences. His research interest lies in the area of DataMining and Web Applications.V.K. Shrivastava is working as an Associate Professor andHOD, Department of Information Technology, HaryanaCollege of Technology and Management, Kaithal,Haryana, India. He has published many research papers inInternational and National reputed journals andconferences. His research interest lies in the area ofDigital Signal Processing. 1309 | P a g e