What is data mining? The process of extracting valid previously unknown, comprehensive, and actionable information from large databases and using it to make crucial business decision It starts by developing a representation of simple data. then extended to larger sets of data working on the premise that the larger data has a structure similar to the
Data mining Applications It is almost applicable in all areas whether it is for business or for science. Provides different purpose and benefits depending where this technique is applied.
Data mining ApplicationsRetail/Marketing Identify buying patterns of customers. Finding association among customer demographic characteristic. Predicting response to mailing campaigns. Market basket analysis.
Data mining ApplicationsBanking Detecting patterns of fraudulent credit card use. Identifying loyal customers. Predicting customers likely to change their credit card affiliation. Determining credit card spending by customer groups.
Data mining ApplicationsInsurance Claims analysis. Predicting which customers will buy new policies.Medicine Characterizing patient behavior to predict surgery visit. Identifying successful medical therapies for different illnesses.
Data mining Operations4 main operations of data mining: Predictive modeling Database segmentation Link analysis Deviation detection
Data mining Operations Predictive modeling Based observations to form a model of the important characteristics of some phenomenon. Database segmentation Is about partitioning of database into an unknown number of segments or clusters of similar records.
Data mining Operations Link analysis Based on links called associations between the individual records and set of records in a database. Deviation detection Newest data mining operation Often a source of true discovery because it identifies outliers which express deviation.
Data mining Process Cross-IndustryStandard Process for Data Mining (CRISP-DM) Specifies a data of data mining process model that is not specific to any industry tool. Involved from unknown knowledge discovery processes used widely in industry and in direct response to user requirements.
Data mining Process (cont…) Major objectives of this specification are to make large data mining projects run more efficiently as well as to make them cheaper, more reliable and more manageable. A hierarchy process model
Data mining Process (cont…) The process is divided into 6 different generic phases ranging from business understanding to deployment of project result. The phases of CRISP-DM model are: Business understanding Data understanding Data preparation Modeling
Data mining Process (cont…) Evaluation Deployment Business understanding This phase is focuses on understanding the project objectives and requirements from the business point of view. Data understanding This phase includes task for initial collection of the data and is concerned with establishing the main characteristics
Data mining Process (cont…) Data preparation This phase involves all the activities for constructing the final data set on which modeling tools can be applied directly. Modeling This phase is the actual data mining operation and involves selecting modeling techniques, selecting modeling parameters and assessing the model created.
Data mining Process (cont…) Evaluation This phase validates the model from the data analysis point of view. The model and the steps in modeling are verified within the context of achieving the business goals. Deployment This phase is all about generating report or as complex as implementing repeatable data mining processing across the enterprise.