1) The document discusses using data mining techniques like naive Bayesian classification to help enterprises better understand customer behavior and improve customer relationship management.
2) It proposes using a naive Bayesian classifier to classify customers into groups based on attributes like sex, age, income from a customer database to predict whether they will propose a credit card.
3) An example is provided applying the naive Bayesian classification algorithm to sample customer data to predict whether an unknown customer record would propose a credit card or not.
This document discusses data mining primitives, languages, and system architectures. It describes five primitives that define a data mining task: task-relevant data, type of knowledge to be mined, background knowledge, pattern interestingness measurements, and visualization of discovered patterns. It also discusses data mining query languages like DMQL that allow users to specify data mining tasks interactively. Finally, it covers different architectures for coupling data mining systems with database/data warehouse systems.
Three case studies deploying cluster analysisGreg Makowski
Three case studies are discussed, that include cluster analysis as a component.
1) Customer description for a credit card attrition model, to describe how to talk to customers.
2) Hotel price optimization. Use clusters to find subsets of similar behavior, and optimize prices within each cluster. Use a neural net as the objective function.
3) Retail supply chain, planning replenishment using 52 week demand curves using thousands of seasonal "profiles" or clusters.
This document discusses modeling claim amounts in insurance using probability distributions and simulations. It begins with an introduction to fitting distributions to insurance claims data. The key steps are outlined as selecting an appropriate loss distribution, estimating its parameters using maximum likelihood estimation, and testing the fit using goodness of fit tests. The Pareto distribution is discussed in more detail as it is commonly used to model insurance claim amounts. The document concludes by describing how to simulate claim amounts above a deductible using the Pareto distribution and calculate reinsurance premiums.
The document proposes a browsing-oriented approach to semantic faceted search to better support fuzzy information needs. It introduces browsing-oriented facet and facet value spaces, including an extended facet tree using clustering. It also proposes a browsing-oriented facet ranking approach based on notions of small steps, uniform steps, and comprehensible result segments. An evaluation finds the extended facet tree and browsing-oriented ranking improve effectiveness and efficiency for complex, fuzzy tasks compared to conventional faceted search approaches.
High-Dimensional Methods: Examples for Inference on Structural EffectsNBER
This document describes a study that uses high-dimensional methods to estimate the effect of 401(k) eligibility on measures of accumulated assets. It begins by outlining the baseline model and notes areas for improvement, such as controlling for income. It then discusses using regularization like LASSO for variable selection in high-dimensional settings. The document explores more flexible specifications by generating many interaction and polynomial terms but notes the need for dimension reduction. It describes using LASSO to select important variables from a large set. The results select a parsimonious set of variables and estimate similar 401(k) effects as the baseline.
ipas implicit password authentication system ieee 2011prasanna9
This document summarizes a proposed authentication system called the Implicit Password Authentication System (IPAS). IPAS aims to address weaknesses in existing authentication schemes like passwords, tokens, biometrics and graphical passwords. It proposes using a set of questions and answers during registration that are then implicitly embedded into images by the server during authentication. The server randomly selects questions and images, requiring the user to demonstrate knowledge of their prior answers without directly reproducing them. The system is intended for mobile banking but could generalize to other client-server environments.
This document summarizes a study on using nanorobots for treating patients with artery occlusion. It describes how chemical and thermal gradients could be used to control nanorobots to target areas needing treatment. Computer simulations were used to model the nanorobots' behavior and evaluate control strategies in coronary arteries. The document also outlines the potential medical applications of nanorobots and the manufacturing technologies needed to develop functional nanorobots for biomedical applications in the next 10-15 years.
Nanorobots in the medicinal world my ieeepriyasindhu
This document discusses the potential applications of nanorobots in medicine. It describes how nanorobots could be used to treat diseases like cancer, diabetes, and cardiovascular issues. Nanorobots would be small enough to navigate the human body and perform targeted drug delivery or surgical procedures. The document outlines some of the technical challenges in developing nanorobots, such as how to power them, how they would communicate data, and how to manufacture sensors and control systems at the nanoscale. It suggests nanorobots could be powered by kinetic energy harvested from blood flow and could transmit data using low-power radio frequency signals.
This document discusses data mining primitives, languages, and system architectures. It describes five primitives that define a data mining task: task-relevant data, type of knowledge to be mined, background knowledge, pattern interestingness measurements, and visualization of discovered patterns. It also discusses data mining query languages like DMQL that allow users to specify data mining tasks interactively. Finally, it covers different architectures for coupling data mining systems with database/data warehouse systems.
Three case studies deploying cluster analysisGreg Makowski
Three case studies are discussed, that include cluster analysis as a component.
1) Customer description for a credit card attrition model, to describe how to talk to customers.
2) Hotel price optimization. Use clusters to find subsets of similar behavior, and optimize prices within each cluster. Use a neural net as the objective function.
3) Retail supply chain, planning replenishment using 52 week demand curves using thousands of seasonal "profiles" or clusters.
This document discusses modeling claim amounts in insurance using probability distributions and simulations. It begins with an introduction to fitting distributions to insurance claims data. The key steps are outlined as selecting an appropriate loss distribution, estimating its parameters using maximum likelihood estimation, and testing the fit using goodness of fit tests. The Pareto distribution is discussed in more detail as it is commonly used to model insurance claim amounts. The document concludes by describing how to simulate claim amounts above a deductible using the Pareto distribution and calculate reinsurance premiums.
The document proposes a browsing-oriented approach to semantic faceted search to better support fuzzy information needs. It introduces browsing-oriented facet and facet value spaces, including an extended facet tree using clustering. It also proposes a browsing-oriented facet ranking approach based on notions of small steps, uniform steps, and comprehensible result segments. An evaluation finds the extended facet tree and browsing-oriented ranking improve effectiveness and efficiency for complex, fuzzy tasks compared to conventional faceted search approaches.
High-Dimensional Methods: Examples for Inference on Structural EffectsNBER
This document describes a study that uses high-dimensional methods to estimate the effect of 401(k) eligibility on measures of accumulated assets. It begins by outlining the baseline model and notes areas for improvement, such as controlling for income. It then discusses using regularization like LASSO for variable selection in high-dimensional settings. The document explores more flexible specifications by generating many interaction and polynomial terms but notes the need for dimension reduction. It describes using LASSO to select important variables from a large set. The results select a parsimonious set of variables and estimate similar 401(k) effects as the baseline.
ipas implicit password authentication system ieee 2011prasanna9
This document summarizes a proposed authentication system called the Implicit Password Authentication System (IPAS). IPAS aims to address weaknesses in existing authentication schemes like passwords, tokens, biometrics and graphical passwords. It proposes using a set of questions and answers during registration that are then implicitly embedded into images by the server during authentication. The server randomly selects questions and images, requiring the user to demonstrate knowledge of their prior answers without directly reproducing them. The system is intended for mobile banking but could generalize to other client-server environments.
This document summarizes a study on using nanorobots for treating patients with artery occlusion. It describes how chemical and thermal gradients could be used to control nanorobots to target areas needing treatment. Computer simulations were used to model the nanorobots' behavior and evaluate control strategies in coronary arteries. The document also outlines the potential medical applications of nanorobots and the manufacturing technologies needed to develop functional nanorobots for biomedical applications in the next 10-15 years.
Nanorobots in the medicinal world my ieeepriyasindhu
This document discusses the potential applications of nanorobots in medicine. It describes how nanorobots could be used to treat diseases like cancer, diabetes, and cardiovascular issues. Nanorobots would be small enough to navigate the human body and perform targeted drug delivery or surgical procedures. The document outlines some of the technical challenges in developing nanorobots, such as how to power them, how they would communicate data, and how to manufacture sensors and control systems at the nanoscale. It suggests nanorobots could be powered by kinetic energy harvested from blood flow and could transmit data using low-power radio frequency signals.
1) The document discusses using data mining techniques to analyze customer attrition for a retail bank. It outlines the process used, including problem definition, data selection, modeling with techniques like decision trees and neural networks, and field testing results.
2) Key steps in the process included data preprocessing, statistical analysis, feature selection, modeling with classifiers like boosted Bayesian networks, and using the model to predict likely attriters to target retention efforts.
3) Field testing found the top of the attrition list identified by the ensemble model did contain concentrated attriters, and the data mining approach was effective for the bank's retention purposes.
This document provides an overview of big data analytics and data visualization. It discusses key concepts like data wrangling, exploring patterns, drawing conclusions, and communicating findings. Common techniques are also summarized, including classification, clustering, association rules, and predictive analytics. Specific algorithms like decision trees, k-means clustering, and hierarchical clustering are explained. The CRISP-DM process model and applications of analytics in areas like customer understanding and process optimization are also covered at a high level. Visualization is presented as an important part of the overall analytics process.
Data mining involves discovering patterns from large amounts of data. It can be used for applications like credit ratings, targeted marketing, fraud detection, and customer relationship management. Some common data mining techniques include classification, clustering, regression, and association rule mining. Decision trees are a popular classification technique that uses a tree structure with internal nodes representing attributes and leaf nodes representing target classes.
A Review on Subjectivity Analysis through Text Classification Using Mining Te...IJERA Editor
The increased use of web for expressing ones opinion has resulted in to an enhanced amount of subjective content available in the Web. These contents can often be categorized as social content like movie or product reviews, Customer Feedbacks, Blogs, Communication exchange in discussion forums etc. Accurate recognition of the subjective or sentimental web content has a number of benefits. Understanding of the sentiments of human masses towards different entities and products enables better services for contextual advertisements, recommendation systems and analysis of market trends. The objective behind framing this paper to analyze various sentiment based classification techniques which can be utilized for quick estimation of subjective contents of Political reviews based on politicians speech. The paper elaborately discusses supervised machine learning algorithm: Naïve Bayes classification and compares its overall accuracy, precisions as well as recall values.
This document provides an overview of business analytics concepts. It defines business analytics as the skills, technologies, and practices used to explore past business performance and gain insights to drive planning. The document outlines a business analytics model with layers moving from strategic goals to technical data sources. It describes types of analytics including descriptive, predictive, and prescriptive. Finally, it discusses the importance of business analytics in decision making, understanding data, providing results, and managing data.
This document discusses using data mining techniques like clustering and classification to segment customers for shipping enterprises. It proposes using k-means clustering on historical freight data to divide customers into segments. A Bayesian network classifier is then used to classify new customers based on the clustering results. The goal is to support marketing decisions by identifying the most valuable customers to focus on through scientific customer segmentation.
This document discusses analytics in cross-selling in the retail banking sector. It outlines different approaches to cross-selling that leverage analytics, such as predictive analytics based on customer data models, rules-based approaches, and value-based approaches. It also discusses challenges in using analytics for cross-selling like lack of expertise, need for clean data, and operational difficulties. Emerging trends in analytics that could improve cross-selling are discussed, such as demand for packaged analytic applications and interest in real-time and advanced analytics.
The document summarizes a study that examined using Shopee data as a data source for business intelligence to understand consumer preferences. It developed a demographic dashboard to visualize consumer data from Shopee over a 6-month period, including sales, orders, visits, and pageviews. Simple linear regression was used to forecast future values of these metrics over the next 6 months. The experts evaluated the usability, sustainability, and maintainability of the dashboard. Their feedback can help improve the business intelligence and inform better decision making.
This document provides instructions for an assignment for an MBA program. It tells students to submit their assignments by August 16th electronically and August 17th as a hard copy. It warns that assignments submitted late will lose 10% per day. It also warns against plagiarism and instructs students to reference sources properly. The assignment contains 8 questions worth a total of 60 marks covering topics like mobile applications, data vs. information, information systems for a growing company, package tracking systems, RFID, GIS, CRM systems, analyzing internet usage statistics, e-commerce website concerns, and benefits of technologies like OLAP and cloud computing.
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
help.mbaassignments@gmail.com
or
call us at : 08263069601
Data Mining Concepts with Customer Relationship ManagementIJERA Editor
Data mining is important in creating a great experience at e-business. Data mining is the systematic way of extracting information from data. Many of the companies are developing an online internet presence to sell or promote their products and services. Most of the internet users are aware of on-line shopping concepts and techniques to own a product. The e-commerce landscape is the relation between customer relationship management (sales, marketing & support), internet and suppliers.
This document discusses customer value modeling from a business intelligence perspective. It defines customer value modeling as a data-driven representation of the monetary worth that a company provides to its customers. Business intelligence tools are instrumental in customer value modeling by quantifying customer benefits in monetary terms based on product features. The document also outlines several methods for creating customer value models, including reverse engineering customer profit and loss statements. It emphasizes the importance of substantial customer interaction to understand how products and services create value for customers.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKIRJET Journal
This document discusses using machine learning classifiers to analyze credit risk. It examines various machine learning techniques for credit risk analysis, including Bayesian classifiers, naive Bayes, decision trees, k-nearest neighbors, multilayer perceptrons, support vector machines, and ensemble methods like bagging and boosting. Two credit datasets from the UCI machine learning repository were used to test the accuracy of these classifiers. The results showed decision trees had the highest accuracy at 89.9% and 71.25% on the two datasets, while k-nearest neighbors had the lowest. Future work could involve rebuilding the models with more accurate data to improve performance. The objective of credit risk analysis is to help banks and financial institutions balance approving loans to creditworthy borrowers
Value creation with big data analytics for enterprises: a surveyTELKOMNIKA JOURNAL
The emergence of Big Data applications has paved the way for enterprises to use Big Data as a value-creation strategy for their business; however, the majority of enterprises fail to know how to generate value from their massive volumes of data. Big Data Analytics results can help the enterprises in better decision-making and provide them with additional profits. Studying different researches dedicated to value creation through Big Data Analytics. This paper (a) highlights the current state of the art proposed for creating value from Big Data Analytics, (b) identifies the essential factors and discusses their effects upon value creation, and (c) provides a classification of the cutting-edge technologies in this field.
This document discusses the importance and benefits of business analytics for organizations. It defines analytics as the discovery and communication of meaningful patterns in data to quantify and improve business performance. Analytics can help organizations better understand customers, detect fraud, reduce risks, and optimize processes. The document outlines different types of analytics like descriptive, predictive, clustering and affinity grouping. It argues that while organizations now have large amounts of data, analytics is needed to extract useful insights and make better decisions. Overall, the document promotes the use of business analytics for sustainability, differentiation, and as a critical success factor for organizations.
Competitive Environment And Maintaining Customer...Lissette Hartman
Waitrose is interested in analyzing customer data to better understand online customers and identify high value customers. Data mining techniques will be used to analyze purchasing data and discover patterns to define a high value customer profile. Maintaining strong customer relationships is key to business success in a competitive environment by treating different customer subgroups differently based on their needs.
Kislaya Kumar Singh provides a summary of his career objective, education, experience, technical skills, projects, and work experience. He has a Post Graduate Program in Business Analytics and Business Intelligence in progress and a Bachelor's degree in Information Science. His experience includes data modeling, machine learning, data analysis tools like Python and R, and statistical modeling. Some of his projects include analyzing cricket data, factor analysis of cereal brands, clustering engineering colleges, and predictive modeling of employee attrition. He has worked as a Technical Lead for Western Union and as a Software Engineer for Symantec, where his responsibilities included API testing, incident management, and executing various security and code coverage tools.
This document discusses analytics and information architecture. It begins by describing how analytics workloads are moving away from data warehouses to more specialized platforms. It then discusses what distinguishes analytics from reporting, including that analytics involve complex summaries of information and linking analyses to business actions. The document examines various data platforms used for analytics and contends that ParAccel Analytic Database is well-suited for analytics workloads due to its columnar structure, compression, SQL support, and ability to utilize Hadoop data without replication. It concludes by proposing an information architecture with Hadoop for big data, ParAccel for analytics, and data warehouses for operational support.
The document discusses several myths about data mining. It summarizes that data mining is not instant predictions from a crystal ball, but rather a multi-step process requiring clean data. It also notes that data mining is a viable technology for businesses that can provide insights regardless of company size or amount of customer data. Advanced algorithms are not the only important aspect of data mining, as business knowledge is also essential.
1) The document discusses using data mining techniques to analyze customer attrition for a retail bank. It outlines the process used, including problem definition, data selection, modeling with techniques like decision trees and neural networks, and field testing results.
2) Key steps in the process included data preprocessing, statistical analysis, feature selection, modeling with classifiers like boosted Bayesian networks, and using the model to predict likely attriters to target retention efforts.
3) Field testing found the top of the attrition list identified by the ensemble model did contain concentrated attriters, and the data mining approach was effective for the bank's retention purposes.
This document provides an overview of big data analytics and data visualization. It discusses key concepts like data wrangling, exploring patterns, drawing conclusions, and communicating findings. Common techniques are also summarized, including classification, clustering, association rules, and predictive analytics. Specific algorithms like decision trees, k-means clustering, and hierarchical clustering are explained. The CRISP-DM process model and applications of analytics in areas like customer understanding and process optimization are also covered at a high level. Visualization is presented as an important part of the overall analytics process.
Data mining involves discovering patterns from large amounts of data. It can be used for applications like credit ratings, targeted marketing, fraud detection, and customer relationship management. Some common data mining techniques include classification, clustering, regression, and association rule mining. Decision trees are a popular classification technique that uses a tree structure with internal nodes representing attributes and leaf nodes representing target classes.
A Review on Subjectivity Analysis through Text Classification Using Mining Te...IJERA Editor
The increased use of web for expressing ones opinion has resulted in to an enhanced amount of subjective content available in the Web. These contents can often be categorized as social content like movie or product reviews, Customer Feedbacks, Blogs, Communication exchange in discussion forums etc. Accurate recognition of the subjective or sentimental web content has a number of benefits. Understanding of the sentiments of human masses towards different entities and products enables better services for contextual advertisements, recommendation systems and analysis of market trends. The objective behind framing this paper to analyze various sentiment based classification techniques which can be utilized for quick estimation of subjective contents of Political reviews based on politicians speech. The paper elaborately discusses supervised machine learning algorithm: Naïve Bayes classification and compares its overall accuracy, precisions as well as recall values.
This document provides an overview of business analytics concepts. It defines business analytics as the skills, technologies, and practices used to explore past business performance and gain insights to drive planning. The document outlines a business analytics model with layers moving from strategic goals to technical data sources. It describes types of analytics including descriptive, predictive, and prescriptive. Finally, it discusses the importance of business analytics in decision making, understanding data, providing results, and managing data.
This document discusses using data mining techniques like clustering and classification to segment customers for shipping enterprises. It proposes using k-means clustering on historical freight data to divide customers into segments. A Bayesian network classifier is then used to classify new customers based on the clustering results. The goal is to support marketing decisions by identifying the most valuable customers to focus on through scientific customer segmentation.
This document discusses analytics in cross-selling in the retail banking sector. It outlines different approaches to cross-selling that leverage analytics, such as predictive analytics based on customer data models, rules-based approaches, and value-based approaches. It also discusses challenges in using analytics for cross-selling like lack of expertise, need for clean data, and operational difficulties. Emerging trends in analytics that could improve cross-selling are discussed, such as demand for packaged analytic applications and interest in real-time and advanced analytics.
The document summarizes a study that examined using Shopee data as a data source for business intelligence to understand consumer preferences. It developed a demographic dashboard to visualize consumer data from Shopee over a 6-month period, including sales, orders, visits, and pageviews. Simple linear regression was used to forecast future values of these metrics over the next 6 months. The experts evaluated the usability, sustainability, and maintainability of the dashboard. Their feedback can help improve the business intelligence and inform better decision making.
This document provides instructions for an assignment for an MBA program. It tells students to submit their assignments by August 16th electronically and August 17th as a hard copy. It warns that assignments submitted late will lose 10% per day. It also warns against plagiarism and instructs students to reference sources properly. The assignment contains 8 questions worth a total of 60 marks covering topics like mobile applications, data vs. information, information systems for a growing company, package tracking systems, RFID, GIS, CRM systems, analyzing internet usage statistics, e-commerce website concerns, and benefits of technologies like OLAP and cloud computing.
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
help.mbaassignments@gmail.com
or
call us at : 08263069601
Data Mining Concepts with Customer Relationship ManagementIJERA Editor
Data mining is important in creating a great experience at e-business. Data mining is the systematic way of extracting information from data. Many of the companies are developing an online internet presence to sell or promote their products and services. Most of the internet users are aware of on-line shopping concepts and techniques to own a product. The e-commerce landscape is the relation between customer relationship management (sales, marketing & support), internet and suppliers.
This document discusses customer value modeling from a business intelligence perspective. It defines customer value modeling as a data-driven representation of the monetary worth that a company provides to its customers. Business intelligence tools are instrumental in customer value modeling by quantifying customer benefits in monetary terms based on product features. The document also outlines several methods for creating customer value models, including reverse engineering customer profit and loss statements. It emphasizes the importance of substantial customer interaction to understand how products and services create value for customers.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKIRJET Journal
This document discusses using machine learning classifiers to analyze credit risk. It examines various machine learning techniques for credit risk analysis, including Bayesian classifiers, naive Bayes, decision trees, k-nearest neighbors, multilayer perceptrons, support vector machines, and ensemble methods like bagging and boosting. Two credit datasets from the UCI machine learning repository were used to test the accuracy of these classifiers. The results showed decision trees had the highest accuracy at 89.9% and 71.25% on the two datasets, while k-nearest neighbors had the lowest. Future work could involve rebuilding the models with more accurate data to improve performance. The objective of credit risk analysis is to help banks and financial institutions balance approving loans to creditworthy borrowers
Value creation with big data analytics for enterprises: a surveyTELKOMNIKA JOURNAL
The emergence of Big Data applications has paved the way for enterprises to use Big Data as a value-creation strategy for their business; however, the majority of enterprises fail to know how to generate value from their massive volumes of data. Big Data Analytics results can help the enterprises in better decision-making and provide them with additional profits. Studying different researches dedicated to value creation through Big Data Analytics. This paper (a) highlights the current state of the art proposed for creating value from Big Data Analytics, (b) identifies the essential factors and discusses their effects upon value creation, and (c) provides a classification of the cutting-edge technologies in this field.
This document discusses the importance and benefits of business analytics for organizations. It defines analytics as the discovery and communication of meaningful patterns in data to quantify and improve business performance. Analytics can help organizations better understand customers, detect fraud, reduce risks, and optimize processes. The document outlines different types of analytics like descriptive, predictive, clustering and affinity grouping. It argues that while organizations now have large amounts of data, analytics is needed to extract useful insights and make better decisions. Overall, the document promotes the use of business analytics for sustainability, differentiation, and as a critical success factor for organizations.
Competitive Environment And Maintaining Customer...Lissette Hartman
Waitrose is interested in analyzing customer data to better understand online customers and identify high value customers. Data mining techniques will be used to analyze purchasing data and discover patterns to define a high value customer profile. Maintaining strong customer relationships is key to business success in a competitive environment by treating different customer subgroups differently based on their needs.
Kislaya Kumar Singh provides a summary of his career objective, education, experience, technical skills, projects, and work experience. He has a Post Graduate Program in Business Analytics and Business Intelligence in progress and a Bachelor's degree in Information Science. His experience includes data modeling, machine learning, data analysis tools like Python and R, and statistical modeling. Some of his projects include analyzing cricket data, factor analysis of cereal brands, clustering engineering colleges, and predictive modeling of employee attrition. He has worked as a Technical Lead for Western Union and as a Software Engineer for Symantec, where his responsibilities included API testing, incident management, and executing various security and code coverage tools.
This document discusses analytics and information architecture. It begins by describing how analytics workloads are moving away from data warehouses to more specialized platforms. It then discusses what distinguishes analytics from reporting, including that analytics involve complex summaries of information and linking analyses to business actions. The document examines various data platforms used for analytics and contends that ParAccel Analytic Database is well-suited for analytics workloads due to its columnar structure, compression, SQL support, and ability to utilize Hadoop data without replication. It concludes by proposing an information architecture with Hadoop for big data, ParAccel for analytics, and data warehouses for operational support.
The document discusses several myths about data mining. It summarizes that data mining is not instant predictions from a crystal ball, but rather a multi-step process requiring clean data. It also notes that data mining is a viable technology for businesses that can provide insights regardless of company size or amount of customer data. Advanced algorithms are not the only important aspect of data mining, as business knowledge is also essential.
2. services and management. In this study, we suggest a depicting n measurements made on the sample from
customer classification and prediction model in commercial n attributes, respectively, A1 , A2 ,..., An .
bank that uses collected information of customers as inputs to 2. Suppose that there are m classes,
make a prediction for credit card proposing. In particular, we
C1 , C 2 ,..., C m . Given an unknown data sample, X (having
chose Naive Bayesian classifier from the various data mining
methods because it is easy to apply and maintain. no class label) ,the classifier will predict that X belongs to
The paper is organized as follows. Section 2 provides a the class having the highest posterior probability, conditioned
brief review of previous research and the next section on X . That is, the naïve Bayesian classifier assigns an
describes our proposed model, the Naive Bayesian classifier
unknown sample X to the class C i if and only if
approach. In Section 4, the research of the example is given to
apply the algorithm to customer classification and prediction
P(C i X ) > P(C j X ) for 1 ≤ j ≤ m, j ≠ i .
in commercial bank. The final section presents the
contributions and future researches of this study.
Thus we maximize P(C i X ) . The class C i for which
II. LITERATURE REVIEW
P(C i X ) is maximized is called the maximum posteriori
In recent years, data mining has gained widespread
attention and increasing popularity in the commercial world, hypothesis. By Bayes theorem,
as in [3],[4]. According to the professional and trade literature,
P( X C i ) P(C i )
more companies are using data mining as the foundation for P(C i X ) =
P( X )
strategies that help them outsmart competitors, identify new
customers and lower costs, as in [5]. In particular, data mining
3. As P( X ) is constant for all classes, only
is widely used in marketing, risk management and fraud
control, as in [6].
P( X C i ) P(C i ) need be maximized. If the class prior
Although recent surveys found that data mining had
grown in usage and effectiveness, data mining applications in probabilities are not known, then it is commonly assumed that
the commercial world have not been widely. Literature about the classes are equally likely, that is,
data mining applications in the fields about economic and
P(C1) = P(C 2) = ... = P(C m) and we would therefore
management are still few. With the realization of importance
of business intelligence, we need to strengthen the research on
maximize P( X C i ) . Otherwise, we maximize
data mining applications in the commercial world.
P( X C i ) P(C i ) . Note that the class prior probabilities may
III. NAIVE BAYESIAN CLASSIFIER ALGORITHM
Bayesian classification is based on Bayes theorem. Studies si
be estimated by P(C i ) = , where si is the number of
comparing classification algorithms have found a simple s
Bayesian classifier known as the naive Bayesian classifier to training samples of class C i , and s is the total number of
be comparable in performance with decision and neural training samples.
network classifiers. Bayesian classifiers have also exhibited 4. Given data sets with many attributes, it would be
high accuracy and speed when applied to large databases.
extremely computationally expensive to compute P( X C i ) .
The naive Bayesian classifier works as follows, as in[7]:
1. Each data sample is represented by an
In order to reduce computation in evaluating P( X C i ) , the
n-dimensional feature vector X = ( x1 , x2 ,..., x n) ,
naive assumption of class conditional independence is made.
3. This presumes that the values of the attributes are Suppose commercial banks hope to increase the customers
conditionally independent of one another, given the class label who will propose credit card. There is a large number of
of the sample, that is, there are no dependence relationships valuable customer information in huge amounts of data
among the attributes. Thus, accumulated by commercial banks, which is used to identify
n
customers and provide decision support. We wish to predict
P( X C i = ∏ P ( x k C i ) . the class label of an unknown sample using naive Bayesian
k =1
classification, given the training data as Table 1. The data
samples are described by the attributes: sex, age, student and
The probabilities P( x1 C i ) , P( x2 C i ) …,
income. The class label attribute, creditcard_proposing has
two distinct values(namely,{yes, no}).
P( x n C i ) can be estimated from the training samples, where
Table 1 Training data from the customer database
RID sex age Student income credit card_
sik , where
(a) If Ak is categorical, then P ( x k C i) = proposing
si 1 male >45 no high yes
sik Ci
is the number of training sample of class having the 2 female 31~45 no high yes
value x k forAk , and si is the number of training samples 3 female 20~30 yes low yes
belonging to C i . 4 male <20 yes low no
(b) If Ak is continuous-valued, then the attribute is 5 female 20~30 yes medium no
typically assumed to have a Gaussian distribution so that 6 female 20~30 no medium yes
P( x k C i ) = g ( x k , μ , ) 7 female 31~45 no high yes
Ci σ Ci
2 8 male 31~45 yes medium no
(xk −μ )
1 −
Ci
9 male 31~45 no medium yes
=
2π σ C i
e 2 σ Ci
2
10 female <20 yes low yes
11 female 31~45
g ( xk , μ , ) is the normal density function for no medium yes
Ci σ Ci
Where
12 male 20~30 no medium no
13 male <20
Ak , while μ C i and σ C i are the mean and
yes low no
attribute
14 female >45 no high no
standard deviation, respectively, given the values for attribute 15 male 20~30 yes low yes
Ak for training samples of class C i . C1 correspond to the class creditcard_proposing
Let
5. In order to classify an unknown sample X , =”yes” and C2 correspond to the class
creditcard_proposing=”no”. The unknown sample we wish
P( X C i ) P(C i ) is evaluated for each class C i . Sample X
to classify is
is then assigned to the class Ci if and only if
X = ( sex =" female" , age ="31 ~ 45" ,
P ( X C i ) P (C i ) > P ( X C j ) P (C j ) student =" no" , income =" medium" )
for 1 ≤ j ≤ m, j ≠ i . We need to maximize P( X C i ) P(C i ) , for i = 1,2 .
In other words, it is assigned to the class C i for which
P(C i ) , the prior probability of each class, can be computed
P( X C i ) P(C i ) is the maximum.
based on the training samples:
P (creditcard _ propo sin g =" yes" ) = 10 / 15 = 0.667
IV AN EXAMPLE P (creditcard _ propo sin g =" no" ) = 5 / 15 = 0.333
4. are faced with lots of data from databases, we can make
To compute P( X C i ) , for i = 1,2 , we compute the
classification and prediction by special data mining software
following conditional probabilities: as SPSS Clementine or SQL server 2005.
P(sex =" female" creditcard _ propo sin g =" yes" )
V. CONCLUSION
=7/10=0.7 Data mining provides the technology to analyze mass
volume of data and/or detect hidden patterns in data to
P ( sex =" female" creditcard _ propo sin g =" no" )
convert raw data into valuable information. This paper mainly
=1/5=0.2 focused on the research of the customer classification and
prediction in Customer Relation Management concerned with
P ( age ="31 ~ 45" creditcard _ propo sin g =" yes" )
data mining based on Naive Bayesian classification algorithm,
=4/10=0.4 which have a try to the optimization of the business process.
The study will help the company to analyze and forecast
P ( age ="31 ~ 45" creditcard _ propo sin g =" no" )
customer’s pattern of consumption, and the premise of
=1/5=0.2 personalized marketing services and management. Although
the paper focuses mainly on the banking industry, the issues
P( student =" no" creditcard _ propo sin g =" yes" )
and applications discussed are applicable to other industries,
=7/10=0.7 such as insurance industry, retail industry, manufacture
industries, and so on.
P( student =" no" creditcard _ propo sin g =" no" )
=1/5=0.2 REFERENCES
[1] Jenkins D(1999). “Customer relationship management and the data
P(income =" medium" creditcard _ propo sin g =" yes" ) =
warehouse” [J]. Call center Solutions, 1892):88-92
3/10=0.3 [2] Chung HM and P Gray(1999). “Data mining” [J]. Journal of
MIS,16(1):11-13
P(income =" medium" creditcard _ propo sin g =" no" ) =3/
[3]HOKEY MIN, Developing the Profiles of Supermarket Customers through
5=0.6 Data Mining[J], The Service Industries Journal, Vol.26, No.7, October 2006,
Using the above probabilities, we obtain pp.747–763
P( X creditcard _ propo sin g =" yes" ) [4] Balaji Padmanabhan, Alexander Tuzhilin , On the Use of Optimization for
= 0.7 × 0.4 × 0.7 × 0.3 = 0.0588 Data Mining: Theoretical Interactions and eCRM Opportunities[J],
P( X creditcard _ propo sin g =" no" ) Management Science . Vol. 49, No. 10, October 2003, pp. 1327 1343
= 0.2 × 0.2 × 0.2 × 0.6 = 0.0048 [5] Davis B(1999). “Data mining transformed” [J]. Informationweek,
P( X creditcard _ propo sin g =" yes" ) P(creditcard _ propo sin g =" yes" ) 751:86-88
= 0.0588 × 0.667 = 0.0392 [6] Kuykendall L(1999).”The data-mining toolbox”[J]. Credit Card
P( X creditcard _ propo sin g =" no" ) P(creditcard _ propo sin g =" no" ) Management,12(6):30-40
= 0.0048 × 0.333 = 0.0016 [7] Han, J. and Kamber M., (2001) Data Mining: Concepts and
Therefore, we predicts creditcard_proposing=”yes” for Techniques[M], San Francisco, CA: Morgan Kaufmann Publishers
sample X. Of course, the example above only illustrates the [8] Kantardzic, M. (2003) Data Mining: Concepts, Models, Methods, and
Bayesian classification algorithm with training data. When we Algorithms[M], Piscataway, NJ: IEEE Press.