A review paper on various data mining techniques
Survey on varoius types of credit fraud and security
Data mining in cloud computing
Survey paper on clustering techniques
A data mining framework for prevention and detection
of financial statement fraud
A review paper:mining educational data to forecast
failure of engineering students
Data mining model for insurance trade in CRM system
• Data mining is the exploration and analysis of large data sets, inorder to discover
meaningful pattern and rules.
• The objective of data mining is to design and work efficiently with large data sets.
• Data mining is the component of wider process called knowledge discovery from
• Data mining is the process of analysing data from different perspectives and
summarizing the results as useful information
• Data mining is a multi-step process,requires accessing and preparing data for a
mining the data, data mining algorithm, analysing results and taking appropriate
Why Data mining?
• Database analysis and decision support
Market analysis and management : Target marketing, customer relation
management,market basket analysis ,cross selling,market segmentation
Risk analysis and management: Forecasting,customer retention,improved under
writing,quality control,competitive analysis
Fraud detection and management
• Other applications
Intelligent query answering
In data mining the data is mined using two learning approaches i.e.supervised
learning and unsupervised learning
In supervised learning (often also called directed data mining) the variables
under investigation can be split into two groups: explanatory variables and other is
dependent variable.The goal of analysis is to specify a relationship between the
dependent variable and explanatory variable the as it is done in regression analysis.
In unsupervised learning , all the variables are treated in same way, there is no
distinction between dependent and explantory variables.
Tasks Of Data Mining
Data Mining as a term for the specific classes of six activities or tasks as
Affinity grouping or association rules
Description and visualization
The first three tasks- classification, estimation,and prediction rules are
examples of directed data mining or supervised learning. The next three
tasks are the examples of undirected data mining.
classification consits 0f examining the features of a newly
presented object and assigning to it a predefined class.
Estimation deals with continuously valued outcomes.
Any prediction can be thought of as classification or estimation.
Predictive tasks feel different because the records are classified according to
some predicted future behavior or estimated future value.
An association rule is a rule which implies certain association
relationships among a set of objects in a database.
Clustering is the task of segmenting a diverse group into a number of
similar subgroup or cluster. In clustering , there are no predefined classes.
General Types of Cluster
Well separated cluster
Shared property or conceptual cluster
Well separated cluster
A cluster is a set of point so that any point in acluster is nearest to every
other point in the cluster as compared to any other point that is not in the
A cluster is a set of object such that an object in a cluster is nearest to the
“center” of a cluster, than to the center of any other cluster.The center of
cluster is often centroid.
A cluster is a set of point so that a point in a cluster is nearest to one or
more other point in the cluster as compared to any point that is not in the
A cluster is a dense region of points, which is separated by according to the
low-density regions, from other regions that is of high density.
Find clusters that share some common property or represent a particular
Description and visualization
Data visualization is a powerful form of descriptive data mining. It is not
always easy to come up with meaning visualizations, but the right picture really
can be worth a thousand association rules since the human beings are extremely
practiced at extracting meaning from visual scenes.
Steps of a KDD process
•Learning the application domain
relevant prior knowledge and goals of application
•Creating a target data set: data selection
•Data cleaning and preprocessing: (may take 60% of effort!)
•Data reduction and transformation
Find useful features, dimensionality/variable reduction, invariant representation
•Choosing functions of data mining
summarization, classification, regression, association, clustering
•Choosing the mining algorithm(s)
•Data mining: search for patterns of interest
•Pattern evaluation and knowledge presentation
visualization, transformation, removing redundant patterns, etc.
•Use of discovered knowledge
Major Issues in Data Mining
•Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web
•Performance: efficiency, effectiveness, and scalability
•Pattern evaluation: the interestingness problem
•Incorporation of background knowledge
•Handling noise and incomplete data
•Parallel, distributed and incremental mining methods
•Integration of the discovered knowledge with existing one: knowledge fusion
Data mining in various fields
Market Analysis and Management
• Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, plus (public) lifestyle studies
• Target marketing
Find clusters of “model” customers who share the same characteristics:
interest, income level, spending habits, etc.,
Determine customer purchasing patterns over time
• Cross-market analysis—Find associations/co-relations between product sales,
& predict based on such association
• Customer profiling—What types of customers buy what products
(clustering or classification)
Market Analysis and Management (cont)
•Customer requirement analysis
Identify the best products for different customers
Predict what factors will attract new customers
• Provision of summary information
Multidimensional summary reports
Statistical summary information (data central tendency and
Corporate Analysis & Risk Management
•Finance planning and asset evaluation
cash flow analysis and prediction
contingent claim analysis to evaluate assets
cross-sectional and time series analysis (financial-ratio, trend analysis, etc.)
summarize and compare the resources and spending
monitor competitors and market directions
group customers into classes and a class-based pricing procedure
set pricing strategy in a highly competitive market
Fraud Detection & Mining Unusual Patterns
•Approaches: Clustering & model construction for frauds, outlier analysis
•Applications: Health care, retail, credit card service, telecomm.
Auto insurance: ring of collisions
Money laundering: suspicious monetary transactions
Professional patients, ring of doctors, and ring of references
Unnecessary or correlated screening tests
Telecommunications: phone-call fraud
Phone call model: destination of the call, duration, time of day or
week. Analyze patterns that deviate from an expected norm
Fraud Detection & Mining Unusual Patterns(contd)
Credit card fraud
Fake doctored card
Lost and stolen card
Intercept fraud(postal service)
Mining to forecast failure of engineering students
Using this mining what are the problems affected by engineering
students and what is the solution to solve that particular problem.
Mining in Insurance trade in CRM system
The large data stored in CRM database is increasing rapidly. Many things are
hidden in database . Using this data mining technique we can retrieve the
data about CRM relationship in insurance.
Mining in cloud computing
Advantage in cloud:
Highly automated and high mobility
There are three types of services in cloud
Iaas(virtual machines, servers)
Data mining involves useful rules or interesting patterns from huge historical
data. Many data mining tasks are available and each of them further has many
techniques. Data mining is an interdisciplinary, artificial and intelligence,
integrated database, machine learning, statistics, etc. Data mining is a large
number of incomplete, noisy, fuzzy, random application of the data found in
hidden, regularity which are noy known by people in advance, but is potentially
useful and ultimately understandable information and knowledge of non-trivial