Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data mining


Published on

my topic is data mining technique.. what is data mining,and its type,application are included in this presentation..

Published in: Education
  • Be the first to comment

Data mining

  1. 1. Submitted by II MCA, PSNACET.
  2. 2. A review paper on various data mining techniques Survey on varoius types of credit fraud and security measures Data mining in cloud computing Survey paper on clustering techniques A data mining framework for prevention and detection of financial statement fraud  A review paper:mining educational data to forecast failure of engineering students Data mining model for insurance trade in CRM system
  3. 3. Data mining • Data mining is the exploration and analysis of large data sets, inorder to discover meaningful pattern and rules. • The objective of data mining is to design and work efficiently with large data sets. • Data mining is the component of wider process called knowledge discovery from database. • Data mining is the process of analysing data from different perspectives and summarizing the results as useful information • Data mining is a multi-step process,requires accessing and preparing data for a mining the data, data mining algorithm, analysing results and taking appropriate action.
  4. 4. Why Data mining? • Database analysis and decision support  Market analysis and management : Target marketing, customer relation management,market basket analysis ,cross selling,market segmentation  Risk analysis and management: Forecasting,customer retention,improved under writing,quality control,competitive analysis  Fraud detection and management • Other applications Text mining Intelligent query answering
  5. 5. In data mining the data is mined using two learning approaches i.e.supervised learning and unsupervised learning supervised learning In supervised learning (often also called directed data mining) the variables under investigation can be split into two groups: explanatory variables and other is dependent variable.The goal of analysis is to specify a relationship between the dependent variable and explanatory variable the as it is done in regression analysis. Unsupervised learning In unsupervised learning , all the variables are treated in same way, there is no distinction between dependent and explantory variables.
  6. 6. Tasks Of Data Mining Data Mining as a term for the specific classes of six activities or tasks as follows: Classification Estimation Prediction Affinity grouping or association rules Clustering Description and visualization The first three tasks- classification, estimation,and prediction rules are examples of directed data mining or supervised learning. The next three tasks are the examples of undirected data mining.
  7. 7. Classification classification consits 0f examining the features of a newly presented object and assigning to it a predefined class. Estimation Estimation deals with continuously valued outcomes. Prediction Any prediction can be thought of as classification or estimation. Predictive tasks feel different because the records are classified according to some predicted future behavior or estimated future value. Association Rules An association rule is a rule which implies certain association relationships among a set of objects in a database.
  8. 8. Clustering Clustering is the task of segmenting a diverse group into a number of similar subgroup or cluster. In clustering , there are no predefined classes. General Types of Cluster Well separated cluster Center-based cluster Contiguous cluster Density-based cluster Shared property or conceptual cluster
  9. 9. Well separated cluster A cluster is a set of point so that any point in acluster is nearest to every other point in the cluster as compared to any other point that is not in the cluster. Center-based cluster A cluster is a set of object such that an object in a cluster is nearest to the “center” of a cluster, than to the center of any other cluster.The center of cluster is often centroid.
  10. 10. Contiguous cluster A cluster is a set of point so that a point in a cluster is nearest to one or more other point in the cluster as compared to any point that is not in the cluster. Density-based cluster A cluster is a dense region of points, which is separated by according to the low-density regions, from other regions that is of high density. Shared property Find clusters that share some common property or represent a particular concept.
  11. 11. Description and visualization Data visualization is a powerful form of descriptive data mining. It is not always easy to come up with meaning visualizations, but the right picture really can be worth a thousand association rules since the human beings are extremely practiced at extracting meaning from visual scenes.
  12. 12. Data mining: KDD process
  13. 13. Steps of a KDD process •Learning the application domain relevant prior knowledge and goals of application •Creating a target data set: data selection •Data cleaning and preprocessing: (may take 60% of effort!) •Data reduction and transformation Find useful features, dimensionality/variable reduction, invariant representation •Choosing functions of data mining  summarization, classification, regression, association, clustering •Choosing the mining algorithm(s) •Data mining: search for patterns of interest •Pattern evaluation and knowledge presentation visualization, transformation, removing redundant patterns, etc. •Use of discovered knowledge
  14. 14. Major Issues in Data Mining Mining methodology •Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web •Performance: efficiency, effectiveness, and scalability •Pattern evaluation: the interestingness problem •Incorporation of background knowledge •Handling noise and incomplete data •Parallel, distributed and incremental mining methods •Integration of the discovered knowledge with existing one: knowledge fusion
  15. 15. Data mining in various fields Market Analysis and Management • Where does the data come from?—Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies • Target marketing Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc., Determine customer purchasing patterns over time • Cross-market analysis—Find associations/co-relations between product sales, & predict based on such association • Customer profiling—What types of customers buy what products (clustering or classification)
  16. 16. Market Analysis and Management (cont) •Customer requirement analysis Identify the best products for different customers Predict what factors will attract new customers • Provision of summary information Multidimensional summary reports Statistical summary information (data central tendency and variation)
  17. 17. Corporate Analysis & Risk Management •Finance planning and asset evaluation cash flow analysis and prediction contingent claim analysis to evaluate assets cross-sectional and time series analysis (financial-ratio, trend analysis, etc.) •Resource planning summarize and compare the resources and spending •Competition monitor competitors and market directions group customers into classes and a class-based pricing procedure set pricing strategy in a highly competitive market
  18. 18. Fraud Detection & Mining Unusual Patterns •Approaches: Clustering & model construction for frauds, outlier analysis •Applications: Health care, retail, credit card service, telecomm. Auto insurance: ring of collisions Money laundering: suspicious monetary transactions Medical insurance Professional patients, ring of doctors, and ring of references Unnecessary or correlated screening tests Telecommunications: phone-call fraud Phone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm
  19. 19. Fraud Detection & Mining Unusual Patterns(contd) Credit card fraud Application fraud Fake doctored card Lost and stolen card Duplicate site Intercept fraud(postal service) Mining to forecast failure of engineering students Using this mining what are the problems affected by engineering students and what is the solution to solve that particular problem.
  20. 20. Mining in Insurance trade in CRM system The large data stored in CRM database is increasing rapidly. Many things are hidden in database . Using this data mining technique we can retrieve the data about CRM relationship in insurance. Mining in cloud computing Advantage in cloud:  Reduced cost  Increased storage  Highly automated and high mobility There are three types of services in cloud Iaas(virtual machines, servers) Paas(execution runtime,database,webserver) Saas(email,games)
  21. 21. Conclusions Data mining involves useful rules or interesting patterns from huge historical data. Many data mining tasks are available and each of them further has many techniques. Data mining is an interdisciplinary, artificial and intelligence, integrated database, machine learning, statistics, etc. Data mining is a large number of incomplete, noisy, fuzzy, random application of the data found in hidden, regularity which are noy known by people in advance, but is potentially useful and ultimately understandable information and knowledge of non-trivial process.
  22. 22. Reference [1]V.Saurkar,Vaibhav,Bhujade(data mining techniques) [2]Amandeep Kaur Mann,Navneet Kaur(clustering) [3]Avinash Ingole, DR.R.C.Thool(credit card fraud) [4]Parikshit Prasad,Rattan Lal( cloud computing) [5]Nasib Singh Gill, Rajan gupta(financial statement fraud) [6]Komal S.Sahedani, B.Supriya Reddy(failure of Engineering students) [7]C.Verhoef,Bas Donkers( Insurance in CRM model)