Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Appli...
Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Appli...
Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Appli...
Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Appli...
Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Appli...
Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Appli...
Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Appli...
Upcoming SlideShare
Loading in …5
×

Ex33900906

250 views

Published on

IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
250
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ex33900906

  1. 1. Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.900-906900 | P a g eData Mining and Knowledge Discovery: Applications,Techniques, Challenges and Process Modelsin HealthcareShaker H. El-Sappagh1, Samir El-Masri2, A. M. Riad3, MohammedElmogy41Faculty of Computers & Information, Mansoura University, Egypt2College of Computer and Information Sciences, King Saud University, Saudi Arabia3,4Faculty of Computers & Information, Mansoura University, EgyptABSTRACTMany healthcare leaders find themselvesoverwhelmed with data, but lack the informationthey need to make right decisions. KnowledgeDiscovery in Databases (KDD) can helporganizations turn their data into information.Organizations that take advantage of KDDtechniques will find that they can lower thehealthcare costs while improving healthcarequality by using fast and better clinical decisionmaking. In this paper, a review study is done onexisting data mining and knowledge discoverytechniques, applications and process models thatare applicable to healthcare environments. Thechallenges for applying data mining techniquesin healthcare environment will also be discussed.Key Words: Data mining, knowledge discovery,healthcare, Electronic health record, healthinformatics.I. INTRODUCTIONThe advances in health informatics asElectronic Health Record (HER) make healthcareorganizations overwhelmed with data.There is awealth of data available within the healthcareindustry that would benefit from the application ofKDD tools and techniques. These techniquestransform the huge mounds of data into usefulinformation for decision making [3]. A propermedical database created with intention mining canprovide a useful resource for data mining andknowledge discovery [1]. The tools and techniquesof KDD have achieved impressiveresults in otherindustries, and healthcare needs to take advantage ofadvances in this exciting field. A hospital typicallyhas detailed data about every charge entered on apatient’s bill, which easily can reach hundreds ofcharges for only a few days’ stay. Each lab test,radiology procedure, medication, and so on isrecorded, whether in a clinical information systemor in a financial billing system.There is an enormous volume of datagenerated [8], but few tools exist in the healthcaresetting to analyze the data fully to determine the bestpractices and the most effective treatments. Ingeneral, the healthcare industry lags far behind otherindustries in terms of information technologyexpenditures. As healthcare continues to becomemore complex, the industry needs to find aneffective means of evaluating its large volume ofclinical, financial, demographic and socioeconomicdata.Clinical decisions are often made based ondoctors’ intuition and experience rather than on theknowledge rich data hidden in the database. Thispractice leads to unwanted biases, errors andexcessive medical costs which affects the quality ofservice provided to patients. Integration of KDDtools with EHR could reduce medical errors,enhance patient safety, decrease unwanted practicevariation, and improve patient outcome.EHR is only a first step in capturing and utilizinghealth-related data – the problem is turning that datainto useful information. Models produced via datamining and predictive analysis can form thebackbone of Clinical Decision Support Systems(CDSS).KDD(also known as knowledge extraction,data/pattern analysis, data archeology, datadredging, information harvesting, and businessintelligence) is the extraction of interesting (non-trivial, implicit, previously unknown and potentiallyuseful) meaningful patterns or knowledge from hugeamount of data stored in multiple data sources suchas file systems, databases, data warehouses and etcby automatic or semi-automatic means [10].KDD has evolved, and continues to evolve, from theintersection of research fields such as machinelearning, pattern recognition, databases, statistics,AI, knowledge acquisition for expert systems, datavisualization, and high-performance computing.More traditional query tools require the user tomake many assumptions. For example, a user mayuse a query tool to ask the question, “What is thelink between cholesterol and heart disease?” Byrunning this query, the analyst assumes that there isa relationship between cholesterol and heart disease.The power of KDD is that it will search the datasetfor all relationships, including those that may nothave occurred to the analyst. With large datasets,there may be many variables interacting with oneanother in very subtle ways. KDD can help find thehidden relationships and patterns withindata.TheKDD is well-defined process consisting of
  2. 2. Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.900-906901 | P a g eseveral distinct steps. Data mining is the core step,which results in the discovery of hidden but usefulknowledge from massive databases. This processmust have a model to control its execution steps.KDD has many applications in healthcaresuch as patient diagnosis, patient treatment,management of chronic diseases, prediction ofpatients at risk for specific diseases, and in publichealth.Because of the complexity of healthcareenvironment, there are many challenges that face theapplication and complete benefit from KDD. Thispaper is organized as follows: section 2 discussesdata mining applications. Section 3 discusses KDDprocess models. Section 4 discusses data miningtechniques. Section 5 is the data mining challenges,and conclusion is discusses in section 6.II. DATA MINING APPLICATIONS INHEALTHCAREThere is vast potential for data miningapplications in healthcare. Generally, these can begrouped as the evaluation of treatment effectiveness;management of healthcare; andCustomerRelationship Management (CRM). More specializedmedical data mining, such as analysis of DNAmicro-arrays, lies outside the scope of this paper.- Evaluation of treatment effectiveness:Data miningapplications can be developed to evaluate theeffectiveness of medical treatments (evidence basedmedicine). By comparing and contrasting causes,symptoms, and courses of treatments, data miningcan deliver an analysis of which courses of actionprove effective such as predict optimum medicationdosage.- Healthcare management: Data mining applicationscan be developed to better identify and track chronicdisease states and high-risk patients, designappropriate interventions, and reduce the number ofhospital admissions and claims. It can search forpatterns that might indicate an attack by bio-terrorists.Moreover, this system can be used forhospital infection control, or as an automated early-warning system in the event of epidemics. Accurateprognosis and risk assessment as survival analysisfor AIDS patients, predict pre-term birth risk,determine cardiac surgical risk, predict ambulationfollowing spinal cord injury, and breast cancerprognosis.- CRM: Customer interactions may occur throughcall centers, physicians’ offices, billing departments,inpatient settings, and ambulatory care settings. Itdetermines the preferences, usage patterns, andcurrent andfuture needs of individuals to improvetheir level of satisfaction.Other applications of KDD in healthcare are:- Many providers are migrating toward the use HER[12]. EHR store a large quantity of patient data ontest results, medications, prior diagnoses, and othermedical history. This is a valuable source ofinformation that could be better used by employingKDD techniques. Several examples includeidentifying patients who should receive flu shots,enroll in a disease management program and are notin compliance with a treatment plan. Moreover,historical data in EHR can help in management ofchronic diseases and anticipating patient’s futurebehavior on the given history. EHR stores spatialand demographic data which can help in publichealth management and planning.- Finding themselves in an increasingly competitivemarket, many healthcare organizations are nowemploying sophisticated marketing efforts. KDDcan help in this arena in ways similar to otherindustries. Organizations can use their data toidentify those most likely to use their facilities andthe most effective marketing activities to reach thoseindividuals.- KDD has proved as successful at identifying fraudin the healthcare industry as it has in otherindustries. Several insurance companies use KDDtechniques to sift through their claims, seeking toidentify fraudulent providers.- KDD is used to predict patient problems based onhis medical history.- When researchers investigate the records ofpatients with a particular disease to determine ifthere are any risk factors in their histories that couldhelp predict the occurrence of the disease in otherpatients; they may identify a new risk factor thatcould help detect the disease sooner in other patientsand allow for more timely intervention.- Data mining techniques could be successfullyapplied for detection and diagnosis with areasonably high performance of many medicaldiseases as cancer [4], diabetes [6, 24], kidney [7]heart diseases [2], andsleep problems [5].- Hospital management for optimize allocation ofresources and assist in future planning for improvedservices. For example, forecasting patient volume,ambulance run volume, etc and predicting length-of-stay for incoming patients.III. DATA MINING PROCESS MODELSIN HEALTHCAREKnowledge discovery is a process, and nota one-time response of the KDD system to a usersaction. As any other process, it has its environment,its phases, and runs under certain assumptions andconstraints. Figure 1 shows a typical decisionmaking environment. In the healthcare environment,the source data in clinical databases and/or EHRscan be queried directly using SQL. A DataWarehouse (DW) can be created to integrate datafrom many sources and enhance data quality.Analysts can apply OLAP tools on the DW. Datawarehouse is not enough for data analysis. Datamining is required to discover hidden patterns ineither EHR or DW.
  3. 3. Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.900-906902 | P a g eKDD is an iterative or cyclic process that involves anumber of stages. Although the specific techniquesmay vary from project to project, the basic processis the same for all KDD problems.Following standard process in KDD help analysts byguiding them through analysis process exposingthose aspects that could otherwise be forgotten orneglected. Figure 2shows the basic phases of theKDD process [11].Selection phase generates the target data setfrom the database. Preprocessingsolve issuesaboutnoise, incomplete and inconsistent data.Thenext phase is transformation of the preprocesseddata into a suitable form for performing the desiredDM task. In the DM phase, a procedure is run thatexecutes the desired DM task and generates a set ofpatterns. However, not all of the patterns are useful.The goal of interpreting and evaluating all thepatterns discovered is to keep only those patternsthat are interesting and useful to the user and discardthe rest. Those patterns that remain represent thediscovered knowledge.This process is a time-consuming,incremental, and iterative process by its very nature,hence many repetition and feedback loops exists inFigure 2. Individual phases can be repeated alone,and the entire process is usually repeated fordifferent data sets.Many data mining process methodologiesare available. However, the various steps do notdiffer much from methodology to methodology. Themajor KDD process models will be discussednext:The first reported KDD model consists of ninesteps and was developed by Fayyad et al. in the mid-1990s [10]. The next model, by Cabena et al.,consists of five steps and was introducedin 1998[14]. The third model, which consists of eight steps,was developed byAnand& Buchner at about thesame time [15]. TheCRISP-DM (CRoss-IndustryStandard Process for DM) process model thatincludes six steps wasfirst proposed in early 1996[17] by a consortium of four companies: SPSS (aprovider of commercialDM solutions), NCR (adatabase provider), Daimler Chrysler, and OHRA(an insurancecompany). The last two companiesserved as sources of data and case studies. Next isthe six-step process model of Cios et al. [16] byadopting the CRISP-DM model to the needsofacademic research community. The mainextensions of the latter model include providing amoregeneral, research-oriented description of thesteps, introduction of several explicitfeedbackmechanisms and a modification of thedescription of the last step, which emphasizes thatknowledgediscovered for a particular domain maybe applied in other domains. Another model is thesix-step KDD process model by Adriaans&Zantinge(1996), which consists of DataSelection, Cleaning,Enrichment, Coding, DM, and Reporting. Anothermodel is the four-step model by Berry &Linoff(1997), which consists of Identifying the Problem,Analyzingthe Problem, Taking Action, andMeasuring the Outcome. Another model is the five-step SEMMA model by the SAS Institute Inc.(1997), which consists of steps namedSample,Explore, Modify, Model, and Assess. This modelwas incorporated into commercial KDsoftwareplatform SAS Enterprise Miner. Another model isthe seven-step model by Han &Kamber (2001),which consists of Learning the Applicationdomain,Creating a Target Data Set, Data Cleaning andPreprocessing, Data Reduction andTransformation,Choosing Functions of DM, Choosing the MiningAlgorithm(s), DM, PatternEvaluation andKnowledge Presentation, and Use of DiscoveredKnowledge. Another model is the five-step modelby Edelstein (2001), which consists of Identifyingthe Problem, Preparing theData, Building theModel, Using the Model, and Monitoring theModel. Another model is the seven-step model byKlosgen&Zytkow (2002), which consist ofDefinition and Analysis ofBusiness Problems,Understanding and Preparation of Data, Setup of theSearch for Knowledge,Search for Knowledge,Knowledge Refinement, Application of Knowledgein Solving theBusiness Problems, and Deploymentand Practical Evaluation of the Solutions. Finally isthe seven-step model by Haglin et al. (2005), whichconsists of Goal Identification, Target DataCreation,Data Preprocessing, Data Transformation, DM,Evaluation and Interpretation, andTake Action steps.A. NEW TRENDS IN KDD PROCESSMODELSThe future of KDD process models is inachieving overall integration and distribution of theSelectFromGroup Byx y z123SQL QuerySimple SQL Query & ReportingData warehouseData Warehouse CreationSlice & Dice ReportingTransformation Mining KnowledgeData Selectionand CleaningKnowledge DiscoveryDataSourcesInternalSources(EHR)ExternalSourcesFigure 1:Decision Making EnvironmentFigure 2: Phases in the KDD process
  4. 4. Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.900-906903 | P a g eentire process through the use of other popularindustrial standards. Another currently veryimportant issue is to provide interoperability andcompatibility between different software systemsand platforms, which also concerns KDD models.Such systems would serve end-users in automating,or more realistically semi-automating the KDDsystems.A current goal is to enable users to carryout KDD projects without possessing extensivebackground knowledge, without manual datamanipulation, and without manual procedures toexchange data and knowledge between differentDM methods. This requires the ability to store andexchange not only the data, but most importantlyknowledge that is expressed in terms of data modelsgenerated by the KDD process and domain expert,and meta-data that describes data and domainknowledge used in the process.The technologies,which can help in achieving these goals, are XMLand Predictive Model Markup Language (PMML).From the KDD point of view, XML is the keytechnology to [18]: Standardize communication between diverse datamining (DM) tools and databases. Build standard data repositories sharing databetween different DM tools that work on differentsoftware platforms. Implement communication protocols between theDM tools. Provide a framework for integration andcommunication between different DMKD steps.The information collected during domain and dataunderstanding steps can be stored as XMLdocuments. Then, the information can be used fordata preparation and data mining steps, as asource of already-accessible information, cross-platforms, and cross-tools.IV. DATA MINING TECHNIQUES INHEALTHCAREThere are a variety of data miningtechniques available, all with pros and cons,depending on the business problem at hand and thedata available for analysis. One of the most difficulttasks is to choose the right data mining techniquewhich requires more and more expertise. The finalchoice depends basically on:- The main goal of the problem to be solved.- The structure of the available data.Conceptual map ofmining technique helps inselection for non-expert miners [26]. Discoveredmodel (pattern, knowledge) from data miningalgorithms can be predictive if they predict the valueof an attribute (referred to as the target attribute ortarget), by making use of the remaining attributes(referred to as predictive attributes). This is alsocalled supervised learning. The target attribute is thefocus of this process, and the data usually includesexamples where the values of the target attributehave been observed. The goal is to build a modelthat can predict the value of the target attribute fornew unseen examples. If the target value is nominalthen the prediction task is known as classificationand each possible value for the target variable isreferred to as its class-label or class. For real-valuedtarget attributes, the prediction task is known asregression. Unsupervised learning refers tomodeling with an unknown target variable. In thatcase, models are solely descriptive. The goal of theprocess is to build a model that describes interestingregularities in the data. Clustering is an example ofa descriptive algorithm that is concerned withpartitioning the examples in similar subgroups.A. A GLANCE AT DATA MININGTECHNIQUESData mining techniques can be classifiedbased on the database, the knowledge to bediscovered and the techniques to be utilized.Classification: if the target attribute is categoricalthen the applicable techniques are decision treeinduction, Bayesian classification, backpropagation(neural network), based on concepts fromassociation rule mining, k-nearest neighbor, casebased reasoning, genetic algorithms, rough settheory, support vector machine and fuzzy set. If thetarget attribute is continuous then we use linear,multiple and non-linear regression.Clustering: many methods are applicable aspartitioning methods (e.g. k-means, k-medoids),hierarchical methods (e.g. chameleon, CURE),density-based methods (e.g. DBSCAN, OPTIC),grid-based methods (e.g. STING, CLIQUE) andmodel-based methods (e.g. statistical and neuralnetwork approaches).Association rules:There are many classifications ofthese algorithms such as according to the types ofvalues handled (e.g. Boolean and quantitative rules),according to the number of dimensions (e.g. singleand multi dimensional) and according to the level ofabstraction (e.g. single and multi level rules).Graph mining, Social Network Analysis (SNA), linkmining and multi-relational data mining: thesetechniques become increasingly important inmodeling sophisticated structures and theirinteractions and relationships.Data mining algorithms are needed inalmost every step in KDD process ranging fromdomain understanding to knowledge evaluation. Forexample, in data preprocessing, algorithms (e.g.genetic algorithms) are used in data integration,transformation and cleaning and in knowledgediscovery. Because of space restriction, we will notgive examples of applicable techniques in healthcarereal problems.The most used software tools that provide thesetechniques are SPSS/ SPSS Clementine, Salford
  5. 5. Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.900-906904 | P a g eSystems CART/MARS/TreeNet/RF, Yale (opensource), SAS / SAS Enterprise Miner, AngossKnowledge Studio / Knowledge Seeker, KXEN,Weka (open source), R (open source), MicrosoftSQL Server, MATLAB and Oracle Data Miner.V. DATA MINING CHALLENGES INHEALTHCAREApplication of KDD in healthcare faces manychallenges as [19, 20]:1- Need for algorithms with very high accuracybecause it is an issue of life or death. The problemsof missing, noisy and inconsistent data complicatethis objective. Moreover, unlike other (arguablysimpler) domains, the medical discipline itself isdiverse, complex and, to an outsider, relativelyopaque.2- Active data mining: It needs two types oftriggers. First type used to fire data miningtechnique to analyze the data automatically aftersome time or making some updates. The secondtype is to enforce the discovered knowledge byembedding the discovered knowledge within clinicalinformation systems.3- For the same problem, we need to apply manydata mining techniques, compare their results andselect the most interesting one.4- Automated or semi-automated KDD system: thischallenge is critical since EHR data and medicalknowledge are changed continuously. KDD processneeds to start automatically or with a minor helpfrom analyst.5- Previous (background) knowledge as concepthierarchy, domain expert knowledge, previouslymined knowledge and rule template must be takenin accountin data preparation, modeling and modelevaluation phases [25]. Background knowledge canbe expressed in different formats: examples may befound in the areas of decision rules, Bayesianmodels, fuzzy sets and concept hierarchies. Figure 3shows that background knowledge must be taken into account in the KDD process.5- The result of data mining system must beappended in the existing knowledge base(knowledge fusion). The new knowledge mayupdate, remove, or add constraints to the existingknowledge.6- Longitudinal, temporal and spatial support: Thereis a need for advancement in data mining techniquesto deal with EHR environment [27]. EHR containsclinical data collected and summarized from manyhealthcare systems, temporal data forming patienthistory, social network of patients (Patient ID,Mother ID, Father ID, and Family ID), and spatialdata as patient addresses. EHR may contain datawith different formats as audio, video, image andtext data. These formats also need advanced datamining techniques. The use of static techniques thusoversimplifies (or may hide) possible relationshipsand thus support for longitudinal, temporal andspatial semantics within the mining process is highlydesirable. Episodic data is often the key to good datamining. Techniques as anomaly detection [21],difference detection [22], longitudinal X analysis[22] and temporal/spatial X analysis [23] are used.7- Mining complex knowledge from complex data:Using multi-relational data mining, miningknowledge in the form of graphs and mining non-relational data as text and image are futureissues.Using text mining is challenging in analysisof physician free text describing patient diagnosesand free text prescription. It can be used tosummarize patient conditions.8- Utilizing data mining in CDSS creates a systemof real-time data-driven clinical decision support, or“adaptive decision support.” It can “learn” overtime, and can adapt to the variation seen in theactual real-world population. The approach is twopronged – developing new knowledge abouteffective clinical practices as well as modifyingexisting knowledge and evidence-based models tofit real-world settings [9].9- Most KDD systems today work with relationaldatabases. KDD and data mining need to beextended to object-oriented and multimediadatabases.10- Various security and privacy aspects: how toensure the users’ privacy while their data are beingmined? One solution isAnonymousness andidentification transformation. Besides datapreprocessing, in order to separate the relativebetween patients and their records which mayreferenced some private information, anonym-oussness and identification transformation are alsoneeded.11- Distributed data mining, mining heterogeneousand multi-agent data: many schemas can be utilizedas collecting data from all sites and then mine it, ormine the local data and distribute the discoveredknowledge. In distributed mining, one problem ishow to mine across multiple heterogeneous datasources such as multi-database and multi-relationalmining.12- Large databases: clinical databases are verylarge and massive with hundreds of tables andfields, millions of records. Using more efficientalgorithms, sampling, approximation, featureselection, and parallel processing can mitigate theproblem.13- High dimensionality: clinical environment has avery large number of fields. It can be solved byusing prior knowledge to know irrelevant variables.Figure 3: Background knowledge role in KDD process
  6. 6. Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.900-906905 | P a g e14- Changing data and knowledge: Rapidlychanging data make previously discoveredknowledge invalid. Solving the problem ofautomatic KDD system can solve this problem.15- Integration with other systems: A standalonediscovery system may not be very useful.Integration issues include integration with a DBMS(e.g. via a query interface), CPOE (Computer-basedPhysician Order Entry system), spread sheets andvisualization tools.16- Distributed data mining: As EHR and clinicaldatabases are mostly distributed. This situationrequire enhancement to data mining algorithms andKDD process to work with distributed data.17- Developing a unified framework of data miningthat can deal with the healthcare environment.18- Mining sequence data and time series data: Aparticularly challenging problem is the noise in timeseries data. It is an important open issue to tackle19- Encoding problem: Healthcare data comprisesboth numeric and textual information formedications diagnostic tests, demographics,problem lists, staff notes and images for radiology.Standard coding, such as International Classificationof Disease (ICD-9-CM) codes for diagnoses andNational Drug Codes (NDC) for drugs, SNOMEDCT (Systematized Nomenclature of Medicine -Clinical Terms), Logical Observation IdentifiersNames and Codes (LOINC) and Classification ofInterventions and Procedures (OPCS-4) wasimplemented when possible rather than having freetext dataentry to facilitate research and analyses.20- Collecting the default knowledge is a challenge.Default knowledge as “Only women can pregnancy”is used by domain expert or physician when makingdecision. This knowledge needs to be collected andadded in the data mining system’s knowledge base.As a result, if a person is pregnant, then the system,by default, knows that she is a woman.VI. CONCLUSIONIn this paper, we tried to make a review ofall KDD applications in healthcare. As thehealthcare is a complex environment, there aremany useful and proved applications of data miningsuch as diagnoses, prognoses, treatment and others.The existing process models for data mining lifecycle are also reviewed. These models are criticalfor any KDD project because knowledge discoverycontains many phases. Existing data miningalgorithms are also reviewed, and all problems,challenges that face KDD in healthcare environmentare collected.REFERENCES[1] S. K.Wasan, V. Bhatnagar, H. Kaur, Theimpactofdataminingtechniquesonmedicaldiagnostics,in the proceeding of DataScience Journal, Vol. 5, 2006, 119-126.[2] K.Srinivas, Applications of Data MiningTechniques in Healthcare and Prediction ofHeart Attacks, International Journal onComputer Science and Engineering(IJCSE), 02(02), 2010, 250-255.[3] M. Khajehei, F. Etemady, Data Mining andMedical Research Studies, IEEE SecondInt. Conference on ComputationalIntelligence, Modeling and Simulation(CIMSiM),2010, 119 – 122.[4] L. Li, et al., Data mining techniques forcancer detection using serum proteomicprofiling, Artificial Intelligence inMedicine, vol. 32(2), 2004, 71-83.[5] O. Shmiel, et al., Data mining techniquesfor detection of sleep arousals, Journal ofNeuroscience Methods, vol. 179(2), 2009,331-337.[6] J. L. Breault,Data mining a diabetic datawarehouse, Journal of ArtificialIntelligence in Medicine, 26 (1-2), 2002,Pages 37-54.[7] A. Kusiak, et al., Predicting survival timefor kidney dialysis patients: a data miningapproach, Computers in Biology andMedicine, vol. 35(4), 2005, 311-327.[8] N. Aggarwal, A. Kumar, H. Khatter, V.Aggarwal, Analysis the effect of datamining techniques on database, Advancesin Engineering Software 47(1), 2012, 164–169.[9] C. Bennett, T. Doub; Data Mining andElectronic Health Records: SelectingOptimal Clinical Treatments in Practice,Proceedings of the 6th Int. Conference onData Mining, pp. 313-318, 2011.[10] Fayyad, U., Piatetsky-Shapiro G. andSmyth, P., from data mining to knowledgediscovery: an Overview, In Fayyad, U,Piatetsky-Shapiro, G, Smyth, P andUthurusamy, R (eds) Advances inKnowledge Discovery and Data Mining.AAAI Press, 1996, 1-34.[11] U. Fayyad, P. Smyth, G. Piatetsky,knowledge discovery and data miningtowards a unifying framework, KDD-96,American Association for ArtificialIntelligence (AAAI), 1996.[12] LI Yun, LI Xiang-sheng,The data miningand knowledge discovery inbiomedicine,IEEE 5thInt. Conf.onComputer Science & EducationHefei(ICCSE), China. 2010, 1050, 1052.[13] J. Han, M. Kamber, Data Mining: Conceptsand Techniques, 2ndedition, MorganKaufmann, 2006.[14] Cabena, P, Hadjinian, P, Stadler, R,Verhees, J and Zanasi, A, Discovering DataMining: From Concepts to Implementation,Prentice Hall. 1998.
  7. 7. Shaker H. El-Sappagh, Samir El-Masri, A. M. Riad, Mohammed Elmogy / InternationalJournal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.900-906906 | P a g e[15] Anand, S, Buchner A; Decision SupportUsing Data Mining, Financial TimeManagement, London. 1998.[16] Cios, K, Kurgan L.,Trends in data miningand knowledge discovery,” In Pal, N andJain, L (eds) Advanced Techniques inKnowledge Discovery and Data Mining.Springer, pp. 1–26. 2005.[17] CRISP-DM, 2003, CRoss IndustryStandard Process for Data Mining,http://www.crisp-dm.org.[18] K. J. Cios, G. W. Moore, Uniqueness ofMedical Data Mining, ArtificialIntelligence in Medicine journal, 26(1-2),2002, 1-24.[19] Q. YANG, 10 Challenging problems indata mining research, Int. Journal ofInformation Technology & DecisionMaking, 5(4)2006, 597–604.[20] Ruben D. Canlas Jr., Data mining inhealthcare: Current applications and issues,MSc. of Science in InformationTechnology, Carnegie Mellon University,Australia 2009.[21] Y. Li, N. Wu, S. Jajodia, X.S.Wang,Enhancing profiles for anomalydetection using time granularities,” J. ofComputer Security, 10(1,2), 2002, 137–157.[22] Anthony K. H. Tung, Hongjun Lu, JiaweiHan, Ling Feng, Efficient mining ofintertransaction association rules, IEEETransactions on Knowledge and DataEngineering, 15(1), 2003, 43–56.[23] C. Freksa, Temporal reasoning based onsemi-intervals, Artificial Intelligence, 54,1992, 199–227.[24] Noah Lee, A. F. Laine, J. Hu, F. Wang, J.Sun, Sh.Ebadollahi, Mining electronicmedical records to explore the linkagebetween healthcare resource utilization anddisease severity in diabetic patients, FirstIEEE Int. Conf. on Healthcare Informatics,Imaging and Systems Biology (HISB), onpage(s): 250 – 257, 2011.[25] F. Alonso, L. Martínez, A. Pérez, J. P.Valente,Cooperation between expertknowledge and data mining discoveredknowledge: Lessons learned, ACM AnInternational Journal on Expert Systemswith Applications, 39(8), 2012, 7524–7535.[26] K. Gibert, M. Sànchez-Marrè, V. Codina,Choosing the Right Data MiningTechnique: Classification of Methods andIntelligent Recommendation Techniques,International Environmental Modeling andSoftware Society (iEMSs), www.iemss.org,2010.[27] G. Hripcsak, C. Knirsch, L. Zhou, A.Wilcox, G. B. Melton, Bias Associatedwith Mining Electronic Health Records,Journal of Biomedical Discovery andCollaboration, 6, 2011, 48‐52.

×