BA is used to gain insights that inform business decisions and can be used to automate and optimize business processes. Data-driven companies treat their data as a corporate asset and leverage it for a competitive advantage. Successful business analytics depends on data quality, skilled analysts who understand the technologies and the business, and an organizational commitment to data-driven decision-making.
Business analytics examples
Business analytics techniques break down into two main areas. The first is basic business intelligence. This involves examining historical data to get a sense of how a business department, team or staff member performed over a particular time. This is a mature practice that most enterprises are fairly accomplished at using.
Introduction to Business Analytics and Simulation
http://nguyenngocbinhphuong.com/course/mo-phong-trong-kinh-doanh/
1) What is Business Analytics?
2) Types of Business Analytics: Descriptive, Predictive & Prescriptive
3) Data for Business Analytics: Structured & Unstructured or Semi-Structured
4) Models in Business Analytics: Logic-Driven Models & Data-Driven Models
5) Types of Business Simulation: Monte Carlo Simulation & System Simulation
The data set used in this project is available in the Kaggle and contains nineteen columns (independent variables) that indicate the characteristics of the clients of a fictional telecommunications corporation. The Churn column (response variable) indicates whether the customer departed within the last month or not. The class No includes the clients that did not leave the company last month, while the class YES contains the clients that decided to terminate their relations with the company. The objective of the analysis is to obtain the relation between the customer’s characteristics and the churn.
LA HUG - Video Testimonials with Chynna Morgan - June 2024Lital Barkan
Have you ever heard that user-generated content or video testimonials can take your brand to the next level? We will explore how you can effectively use video testimonials to leverage and boost your sales, content strategy, and increase your CRM data.🤯
We will dig deeper into:
1. How to capture video testimonials that convert from your audience 🎥
2. How to leverage your testimonials to boost your sales 💲
3. How you can capture more CRM data to understand your audience better through video testimonials. 📊
More Related Content
Similar to Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
BA is used to gain insights that inform business decisions and can be used to automate and optimize business processes. Data-driven companies treat their data as a corporate asset and leverage it for a competitive advantage. Successful business analytics depends on data quality, skilled analysts who understand the technologies and the business, and an organizational commitment to data-driven decision-making.
Business analytics examples
Business analytics techniques break down into two main areas. The first is basic business intelligence. This involves examining historical data to get a sense of how a business department, team or staff member performed over a particular time. This is a mature practice that most enterprises are fairly accomplished at using.
Introduction to Business Analytics and Simulation
http://nguyenngocbinhphuong.com/course/mo-phong-trong-kinh-doanh/
1) What is Business Analytics?
2) Types of Business Analytics: Descriptive, Predictive & Prescriptive
3) Data for Business Analytics: Structured & Unstructured or Semi-Structured
4) Models in Business Analytics: Logic-Driven Models & Data-Driven Models
5) Types of Business Simulation: Monte Carlo Simulation & System Simulation
The data set used in this project is available in the Kaggle and contains nineteen columns (independent variables) that indicate the characteristics of the clients of a fictional telecommunications corporation. The Churn column (response variable) indicates whether the customer departed within the last month or not. The class No includes the clients that did not leave the company last month, while the class YES contains the clients that decided to terminate their relations with the company. The objective of the analysis is to obtain the relation between the customer’s characteristics and the churn.
Similar to Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx (20)
LA HUG - Video Testimonials with Chynna Morgan - June 2024Lital Barkan
Have you ever heard that user-generated content or video testimonials can take your brand to the next level? We will explore how you can effectively use video testimonials to leverage and boost your sales, content strategy, and increase your CRM data.🤯
We will dig deeper into:
1. How to capture video testimonials that convert from your audience 🎥
2. How to leverage your testimonials to boost your sales 💲
3. How you can capture more CRM data to understand your audience better through video testimonials. 📊
Company Valuation webinar series - Tuesday, 4 June 2024FelixPerez547899
This session provided an update as to the latest valuation data in the UK and then delved into a discussion on the upcoming election and the impacts on valuation. We finished, as always with a Q&A
Cracking the Workplace Discipline Code Main.pptxWorkforce Group
Cultivating and maintaining discipline within teams is a critical differentiator for successful organisations.
Forward-thinking leaders and business managers understand the impact that discipline has on organisational success. A disciplined workforce operates with clarity, focus, and a shared understanding of expectations, ultimately driving better results, optimising productivity, and facilitating seamless collaboration.
Although discipline is not a one-size-fits-all approach, it can help create a work environment that encourages personal growth and accountability rather than solely relying on punitive measures.
In this deck, you will learn the significance of workplace discipline for organisational success. You’ll also learn
• Four (4) workplace discipline methods you should consider
• The best and most practical approach to implementing workplace discipline.
• Three (3) key tips to maintain a disciplined workplace.
Premium MEAN Stack Development Solutions for Modern BusinessesSynapseIndia
Stay ahead of the curve with our premium MEAN Stack Development Solutions. Our expert developers utilize MongoDB, Express.js, AngularJS, and Node.js to create modern and responsive web applications. Trust us for cutting-edge solutions that drive your business growth and success.
Know more: https://www.synapseindia.com/technology/mean-stack-development-company.html
Personal Brand Statement:
As an Army veteran dedicated to lifelong learning, I bring a disciplined, strategic mindset to my pursuits. I am constantly expanding my knowledge to innovate and lead effectively. My journey is driven by a commitment to excellence, and to make a meaningful impact in the world.
The world of search engine optimization (SEO) is buzzing with discussions after Google confirmed that around 2,500 leaked internal documents related to its Search feature are indeed authentic. The revelation has sparked significant concerns within the SEO community. The leaked documents were initially reported by SEO experts Rand Fishkin and Mike King, igniting widespread analysis and discourse. For More Info:- https://news.arihantwebtech.com/search-disrupted-googles-leaked-documents-rock-the-seo-world/
Putting the SPARK into Virtual Training.pptxCynthia Clay
This 60-minute webinar, sponsored by Adobe, was delivered for the Training Mag Network. It explored the five elements of SPARK: Storytelling, Purpose, Action, Relationships, and Kudos. Knowing how to tell a well-structured story is key to building long-term memory. Stating a clear purpose that doesn't take away from the discovery learning process is critical. Ensuring that people move from theory to practical application is imperative. Creating strong social learning is the key to commitment and engagement. Validating and affirming participants' comments is the way to create a positive learning environment.
Buy Verified PayPal Account | Buy Google 5 Star Reviewsusawebmarket
Buy Verified PayPal Account
Looking to buy verified PayPal accounts? Discover 7 expert tips for safely purchasing a verified PayPal account in 2024. Ensure security and reliability for your transactions.
PayPal Services Features-
🟢 Email Access
🟢 Bank Added
🟢 Card Verified
🟢 Full SSN Provided
🟢 Phone Number Access
🟢 Driving License Copy
🟢 Fasted Delivery
Client Satisfaction is Our First priority. Our services is very appropriate to buy. We assume that the first-rate way to purchase our offerings is to order on the website. If you have any worry in our cooperation usually You can order us on Skype or Telegram.
24/7 Hours Reply/Please Contact
usawebmarketEmail: support@usawebmarket.com
Skype: usawebmarket
Telegram: @usawebmarket
WhatsApp: +1(218) 203-5951
USA WEB MARKET is the Best Verified PayPal, Payoneer, Cash App, Skrill, Neteller, Stripe Account and SEO, SMM Service provider.100%Satisfection granted.100% replacement Granted.
[Note: This is a partial preview. To download this presentation, visit:
https://www.oeconsulting.com.sg/training-presentations]
Sustainability has become an increasingly critical topic as the world recognizes the need to protect our planet and its resources for future generations. Sustainability means meeting our current needs without compromising the ability of future generations to meet theirs. It involves long-term planning and consideration of the consequences of our actions. The goal is to create strategies that ensure the long-term viability of People, Planet, and Profit.
Leading companies such as Nike, Toyota, and Siemens are prioritizing sustainable innovation in their business models, setting an example for others to follow. In this Sustainability training presentation, you will learn key concepts, principles, and practices of sustainability applicable across industries. This training aims to create awareness and educate employees, senior executives, consultants, and other key stakeholders, including investors, policymakers, and supply chain partners, on the importance and implementation of sustainability.
LEARNING OBJECTIVES
1. Develop a comprehensive understanding of the fundamental principles and concepts that form the foundation of sustainability within corporate environments.
2. Explore the sustainability implementation model, focusing on effective measures and reporting strategies to track and communicate sustainability efforts.
3. Identify and define best practices and critical success factors essential for achieving sustainability goals within organizations.
CONTENTS
1. Introduction and Key Concepts of Sustainability
2. Principles and Practices of Sustainability
3. Measures and Reporting in Sustainability
4. Sustainability Implementation & Best Practices
To download the complete presentation, visit: https://www.oeconsulting.com.sg/training-presentations
Kseniya Leshchenko: Shared development support service model as the way to ma...Lviv Startup Club
Kseniya Leshchenko: Shared development support service model as the way to make small projects with small budgets profitable for the company (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
1. Data Mining Tasks /
Operations
Managers need to perform a variety of tasks with the data to formulate
best course of action / take decisions. The tasks are given in detail in
this unit.
2. Data mining tasks/operations relevant to CRM
1. Classification: mapping a given data item into one of the several predefined
classes. (It is a predictive technique)
2. Regression: Predicting the value of a dependent variable based on values of
independent variable/s
3. Link Analysis: Establishing relationship between items or variables in a database
record to expose patterns and trends. Example- Association rules, Sequential
patterns, Time Sequences
4. Segmentation: Identifying a finite set of naturally occurring clusters or categories
to describe data : Clustering
5. Deviation Detection: Discovering the most significant change in the data from
previously measured or expected values.
3. • Classification is perhaps the most basic form of data analysis.
• Classification is a process that maps a given data item into one of the several predefined
classes (Weiss & Kulikowski, 1991)
• A Retail store allows customers to purchase on credit & pay later in installments QUESTION-
Which of the customers are going to receive this facility??
Answer use classification technique to predict if customer is worthy of credit
• CRM uses classification for a variety of purposes like behavior prediction, product & customer
categorization. For example - The recipient of an offer can respond or not respond. An applicant
for a loan can repay on time, repay late or declare bankruptcy. A credit card transaction can be
normal or fraudulent. Classification helps in all the above cases
• Classification is meaningful only if we have predefined classes.
• In classification we are trying to give a correct label to a given input
• It is a directed (supervised) style data mining technique (there is a target variable that the
manager is interested in) and it is predictive.
1. Classification
4. Classification
Goal of Classification: mapping previously unseen records to a a pre-
defined class as accurately as possible
1. Given a collection of historical records (used a training set)
2. Each record contains a set of attributes, one of the attributes is the
‘CLASS’.
3. Build a model for class attribute as a function of the values of other
attributes.
4. A test set is used to determine the accuracy of the model. Usually, the
given data set is divided into training and test sets, with training set
used to build the model and test set used to validate it.
5. New customers can then be classified into two classes. Example:
those likely to default payment and those unlikely to default payment
of installments.
5.
6. Example of use of classification in Income tax
department
• Identify people who would have underpaid tax
• classify people (historical data) into 2 groups – those who pay required tax (not
cheat) and those who underpay tax (cheat)
• Variables used to build a model – claim for tax refund, marital status, taxable
income
• Historical data divided into training set and test set. Model built with training
set and accuracy of the prediction checked with test set
7. Example of use of Classification (Income
tax)
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Refund Marital
Status
Taxable
Income Cheat
No Single 75K ?
Yes Married 50K ?
No Married 150K ?
Yes Divorced 90K ?
No Single 40K ?
No Married 80K ?
10
Test
Set
Training
Set
Model
Learn
Classifier
8. Classification: Applications
Direct Marketing
Goal: Reduce cost of mailing by targeting a set
of consumers most likely to buy a new
product.
Approach:
1. Use the data for a similar product
introduced before.
2. We know which customers decided
to buy and who decided otherwise.
This {buy, don’t buy} decision
forms the class attribute.
3. Use various demographic, lifestyle,
and company-interaction related
information about all such
customers.
4. Use this information as input
attributes to learn a classifier
model.
5. Use model on new data and send
mails only to those who are
predicted to buy
Credit Rating
Goal: Predict who will default payments for a
bank loan/credit scheme of a retail store
Approach:
1. Use demographics and behavioral data as
attributes -Age, salary, marital status, tax
paid etc
2. Label past customers as low and high risk
(class attribute)
3. Learn a model for the class of the
transactions.
4. Use this model to detect possible
defaulters
5. Extend loan only to the class who are of
‘low risk class’
10. Regression Basics
• Managerial decisions are often based on relationship between 2 or more
variables
• Regression is a data mining task which helps us to find the relation between a
dependent variable (Y) and one (X) or more independent variables (X1, X2, X3 …..)
• We can quantify the strength of the relationship between the two variables.
• This can then be used to predict the value of Y, given the value of X.
• It is a predictive technique.
• It is a directed data mining techniques
• Both dependent and independent variables are numeric in linear regression.
11. WHERE DO WE USE REGRESSION?
• Companies would like to know about factors significantly associated with their Key
Performance Indicators in order to be in a position to predict them.
• Marketing – Sales, Customer Satisfaction, Churn, New product acceptance
• Finance – Credit-risk
• HRM – Attrition, Job satisfaction
• Operations – efficiency
12. Regression uses in CRM
• The prediction of sales using data of ad spending,
price (FMCG)
• Predicting demand with price (airline, FMCG etc)
• To predict customer churn based on customer bill
details. (Mobile)
• Model sales with variables like hours when shop is
open, discounts given etc (retail).
13. Regression basics
• Dependent (response/outcome - Y) and
Independent (explanatory/input - X) variable
• Scatter plot
• Line of best fit- The relationship between the
two variables is approximated by a straight
line
• b0 - This is the intercept of the regression
line with the y-axis. It is the value of Y if the
value of X = 0
• b1 - This is the SLOPE of the regression line.
Thus this is the amount that the Y variable
(dependent) will change for each 1 unit
change in the X variable
y = b0 + b1x
+e
Linear regression calculates an equation
that minimizes the distance between the
fitted line and all of the data points.
14. Find the answer
• If the slope of the regression line is calculated to be 2.5 and the
intercept 16 then what is the value of Y (dependent variable) when X
(independent) is 4?
15. Simple Linear Regression Equation
Positive Linear Relationship
E(y)
x
Slope b1
is positive
Regression line
Intercept
b0
Corporate Example: Increase in end retailer knowledge of offers positively
correlated to pre-paid recharge sales in mobile
16. Simple Linear Regression Equation
Negative Linear Relationship
E(y)
x
Slope b1
is negative
Regression line
Intercept
b0
Price and demand of products negatively correlated
17.
18. 3. LINK ANALYSIS
It is a data mining technique that addresses relationships and
connections
19. Link Analysis
• Based on a branch of mathematics called Graph Theory, which represents
relationships between different objects as edges in a graph.
• Nodes: (Vertices) Things in the graphs that have relationships Eg. People ,
Organization, Objects, places, transactions etc.
• Edges: The relationship connection between nodes
21. Link Analysis
• Link analysis seeks to establish relation between items or variables in a database
to expose patterns and trends.
• It can also trace connections between items in records over a period of time. It is
mostly undirected.
• Helps in visualizing relationship. It is descriptive techniques
• Has limited application - Cannot solve all types of problems
• Yields good results in following CRM related applications:
• Analyzing telephone call patterns
• Analyzing link between cities / places travelled by customers (airline,
transportation companies)
• Understanding referral patterns
• Analyzing patterns of purchase in shopping cart (characterizing product
affinities in retail purchases) – most common use
22. Understanding referral patterns
Link analysis helps us to
analyze referral patterns
This help us to locate most
influential customers and
target them with offers
25. Other Applications
• Analyzing link between web pages
• Social Network Analysis
• Spread of diseases (like flu) / Spread of ideas / diffusion of
innovation
• Crime & terror links
26. Google – Directed Graph Example
• Web pages = nodes
• Hyperlinks = edges
• Spiders & Web crawlers
updating
27. Google – example continued
• Authority versus mere
popularity
• Rank by number of unrelated
sites linking to a site yields
popularity
• Rank by number of subject-
related hubs that point to
them yields authority
• Helps to overcome the
situation that often arises in
popularity where the real
authority is ranked lower
because of lack of popularity
of links to it
28. Page Rank
• Each link acts as a vote which
helps in ranking
• Analyse the incoming links to your
page
• Evaluate the quality of link
• Analyse the link building strategy
• Improve your page rankings
• Used in SEO
31. Anomalies or
Deviations
• Outliers are data points that are
considerably different from the
remainder of the data
• Anomalous events occur
infrequently but the consequences
can be dramatic and very often
negative
• 1985 British Antarctic survey showed
depletion of ozone layer by 10%.
Why did NASA’s nimbus 7 satellite
not capture this fact?
32. Deviation Detection
• Focuses on discovering the most significant changes in the data from
previously measured, expected or normative values.
• Most CRM solutions have a DD task running in parallel on a regular
basis.
• Deviation might be a result of random occurrence or a result of true
change in fundamentals.
• It is predictive.
• It is exploratory in nature and does not ascertain the reason for the
occurrence of the detected pattern.
33. Application of Deviation detection
Mostly used for security purposes
• Fraud detection in insurance claims (fictitious, inflated claims)
• Deviation in Credit card/ATM/Debit card usage (purchases not fitting the profile of
customer, large amount of e-commerce purchase, flagging off sudden
international cases, small purchases followed by big purchases)
• Telecommunications fraud detection (subscription fraud, superimposed fraud)
• Store sales deviation (in chain stores – revenues, expenses)
• Deviation in medical parameters (genes linked diseases)
• Network intrusion detection
35. Segmentation
Segmentation is a key data
mining technique.
Segmentation refers to the business task
of identifying groups of records that are
similar to each other (A segment is a
group of consumers that react in a
similar way to a particular marketing
approach)
36. Discussion
• What is the difference between Segmentation and
Clustering?
Clustering ( a tool for segmentation) is the process of finding
similarities in customers so that they can be grouped, and
therefore segmented.
• What is the difference between Segmentation and
Classification?
Classification, in the most traditional sense has training
labels, and is in charge of classifying something as a class or
not. (It predictive technique)
37. Segmentation can be directed or undirected
• Directed - One simple way of segmentation may be segmenting customers based
on the type of credit cards that they use – silver, gold, platinum etc.
• Undirected – Sometimes target variables may not be known in advance, the goal
then is to search of patterns that suggest targets. The clustering tool is used for this
purpose. Example – Grouping people together based on their size fit for stitching
military uniforms
38. Need for segmentation
• Today businesses that cast a wide net in their marketing activities cannot
maximize their profitability and return on investments.
• The ability to identify segments with latent need, purchase behavior patters,
attitudes, etc will make it easy for actionable marketing programs
Segmentation helps in
1.Provide a platform to identify and serve your most profitable customers
2.Focusing retention efforts
3.Customizing messages and offers
4.Identifying less served groups
39. Some factors that can be used to segment
customers
• Geography / Demographics
• Psychographics – Attitudes, Interest, opinions etc
• Buying behaviour – usage, purchase occasion
• Length of relationship / Tenure in current subscription
• Revenue / profitability / CLV
• Loyalty / propensity to churn
• Channel used
42. Data mining tools
1.Decision Tree: Decision Tree is a tool that classify cases into finite number
of classes considering one variable at a time & dividing the entire set based
on it. (Helps in classification task)
2.Rule Induction (Association rules) : is a tool which helps in extraction of
formal rules from a set of observations. (Helps in finding link between
products)
3.Nearest Neighbor Technique: It is a tool which uses past data instances with
known output values to predict an unknown output value of a new data
instance. (Helps in deviation detection and classification)
4.Clustering: Clustering is the process of grouping observations of similar
kinds into smaller groups within the larger population. (Helps in
segmentation task)
5.Visualization: Graphical representation of Data
43. 1. Decision tree
• Decision Tree is a tool that classify cases into finite number of classes
considering one variable at a time & dividing the entire set based on it
• Classification is possible because at each node, the decision tree splits the
database into segments based on value of the variable considered
• There are hierarchical collection of rules that describe how to divide a large
collection of records into successively smaller groups of records. With each
successive division, the members of the resulting segments become more and
more similar to one another with respect to the target
44. How it works?
• Decision trees recursively split data into smaller and smaller cells which
are increasingly “pure "in the sense of having similar values of the
target
• Decision tree uses target variable to determine how each input should
be partitioned
• Taken together, the rules for all the segments form the decision tree
model
45.
46.
47. 2. Rule Induction
Rule induction is an area of machine learning in which formal rules are extracted
from a set of observations.
48. Rule Induction - Association Rules
• In rule induction, we focus on association rules in our syllabus.
• Association rules were originally derived from point-of-sale data that
describes transactions consisting several products
• These are rules of the form ‘if A then B’ (A B).
• These rules do not give the exact nature of the relationship but only point
towards a general interaction between the two items.
• Association analysis is a type of undirected data mining that finds patterns in
the data where the target is not specified beforehand
• Useful rules contain high-quality, actionable information. After the pattern is
found, justifying it by telling a story can lead to insights and action.
49. Association Rule – application in retail sector
• Association rules help in market
basket analysis in retail sector -
What are the product affinities in a
shopping cart.
• IF <LHS> THEN <RHS>
• IF { X } THEN { Y }
• They imply a relationship between
X and Y, where X and Y can be
single items or sets of items.
• X is some times referred to the
antecedent and Y as the
consequent
Rules Discovered:
{Milk} --> {Coke}
{Diaper, Milk} --> {Beer}
50. Rules are not always useful. They might be
actionable at times, and trivial or inexplicable
at other times
• Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of
also purchasing one of the three types of a brand of candy bars (actionable).
• Customers who purchase maintenance agreements are very likely to
purchase large appliances (trivial findings)
• When a new hardware store opens, one of the most commonly sold items is
toilet cleaners (inexplicable)
51. Developing Association Rules
• The table on the right side indicates
transactions (of 5 different
customers) from a grocery store.
• Each transaction gives information
about which products are purchased
with which other products
• The information is then copied on to
a co-occurrence matrix that tells the
number of times any pair of product
was purchased together.
• The goal is to turn item sets into
association rules.
52. Co-Occurrence of Products
The numbers in the co-occurrence matrix
represent the no. of transactions
For example, window cleaners and orange
juice occur together only once, while soda
and orange juice occur together in 2
transactions out of 5
2
2
53. How good is an association rule
Three traditional methods for
measuring the goodness of an
association rule are
•Support
•Confidence
•Lift
54. Support, Confidence and Lift
• Support shows the prevalence of the rule in the given set of transactions. It
measures the ratio of transactions that contain all the items in the rule to
the total number of transactions. A high level of support indicates that the
rule is frequent enough for the business to be interested in it.
• Confidence measures how good a rule is at predicting the right hand side,
by comparing how often the right-hand side appears when the condition
on the left-hand side is true. It is the probability of the occurrence of the
consequent once the antecedent has taken place (conditional probability). It
is calculated as the ratio of number of transactions supporting the entire
rule to the number of transactions supporting the left-hand side of the rule.
• Lift measures the power of association of the rule. Lift indicates how much
more likely the consequent is to be purchased if the customer has bought
the antecedent as compared to the likelihood of consequent being
purchased without the other item. Lift must be above 1 for the association
to be of interest. Lift is calculated as the ratio of the confidence of the rule to
the prevalence of the right-side of the rule.
55. Calculating Support
A= soda and B= orange juice,
Support A->B
= n(AnB)/ total transactions
• No. of transactions when
soda and orange juice
appear together = 2.
• Total no. of transactions
(total customer billings) = 5
• Support for IF <Soda> Then
<OJ> = 2/5 = 40%
56. Calculating Confidence
Confidence AB = Support
(A n B)
Support (A)
Support AnB = 2/5
Support A = 3/5
Confidence IF <Soda> Then
<OJ> = 2/5 = 67%
3/5
57. Calculating Lift
• Lift (also called improvement) measures the power of
association of the full rule.
• The reason for doing that is because the items on the RHS
might be very common so the confidence may not be telling us
anything regarding the actual predictability of the rule
Lift AB = Support (A n B)/ Support (A) Support(B)
Or it can be simplified as
• Confidence (A n B)/Support (B)
58. Calculating Lift Confidence of the rule A B = 67%
Support (B) = 4/5 = 80%
Lift A B= 67/80 = 0.8375
Here Lift is <1, rule not useful
If Lift is > 1 then it shows that the association rule
is well established and the RHS is going to be in
more no. of cases when LHS is there than when
LHS is not there.
Hypothetically say if Lift in this case was 1.25 then
we can say that ‘’OJ’’ is 25% more likely to be in the
transaction if ‘’Soda’’ is purchased than when soda
not purchased
59. Association Rule : Application
Retail
1. Supermarket shelf management
2. Bundling products with offers
3. Inventory Management
4. Pricing associated products
5. Catalog designing
6. Recommender systems
Other sectors
1. Items purchased on a credit card such as rental cars and hotel rooms.
2. Optional services purchased by telecommunication customer (call forwarding, caller tune etc.)
3. Banking products used by retail customers (money market accounts, investment services, car
loans etc.)
4. Medical patient histories can give indications of likely complications on certain combinations of
treatments.
60. Support, confidence and Lift
1.Milk, Bread, Orange juice
2.Milk, Bread, floor cleaner
3.Milk, bread, Chocolate
4.Chocolate, floor cleaner
Support of If milk then bread = 75%
Confidence = 100%
Lift (of If milk then bread) = .75
.75 x .75
= 1.33
Inference There is a strong positive
association
1.Milk, Cornflakes, Noodles
2.Milk, Shampoo, Bread
3.Shampoo, soap, noodles
4.Milk, Cornflakes
Support of If milk then Shampoo = 25%
Confidence = 33%
Lift of If milk then bread = .25
.75 x .5
= .66
62. NNT - Introduction
• Distance and similarity are important to nearest neighbour
technique
• Uses past data instances with known output values to predict an
unknown output value of a new data instance.
• It is based on:
1. Memory-Based Reasoning(MBR)
Identifying similar cases from past experience and applying the
information from these cases to the problem at hand.
2. Collaborative filtering or Social filtering
Finds neighbours who are similar(with similar preference) to a new
record and uses it for classification and prediction.
63. Height and weight of
known cases
• X axis – Height
• Y axis – weight
• Suppose the dark circles represent Men and
White circles represent women. Given the height
and weight of the new data point we can predict if
it is a man or a woman.
• NNT is about existence of two operations
-Distance function capable of calculating a distance
between any two records
-Combination function capable of combining results
from several neighbours to arrive at an answer
• K value (how many neighbours to be considered
while combining) has to be specified
64. Attributes to consider
• People living in the same neighborhood generally have similar
household income, hence if a new person comes as a neighbor,
there is a high chance that the new person income is similar to
the other people income in the neighborhood
• However, there could be a variety of factors which can affect
the person income like the degree attained or college attended
hence a better predictor of the person’s income
• So ‘near’ can be the degree attained or college attended than
the neighborhood
65. Applications of NNT
• Deviation detection – flag cases for further investigation based on
nearest neighbour
• Customer response prediction – Next likely customers for campaigns
• Loan default prediction – predicting default bases on income and
debt or any other available variables
• Medical treatment – Diagnosis and treatment of cases (classifying
anomalies in mammograms)
• Using to fill Missing values in data - classification
• Plagiarism testing
• Making recommendations - Netflix
66. NNT characteristics
• NNT uses data “as is” and is not affected by format of records as the algorithms are robust with
missing data and erroneous data
• Many advantages but NNT is a resource hog !! – It requires large amount of historical data and
Large amount of processing time
67. Finding nearest neighbor for audio files
• Shazam identifies the song in half a minute
and provides the name of the song, artist etc
• Over 100 million downloads
• Many challenges for this app –direct
comparison of snippets of music will not
work
• Uses nearest neighbor technique
• To classify songs based on their audio
signature
69. Some apps are listening to you through the smartphone’s mic to
track your TV viewing, says report
• The apps use software from
Alphonso, a start-up that collects TV-
viewing data for advertisers. Using a
smartphone’s microphone,
Alphonso’s software can detail what
people watch by identifying audio
signals in TV ads and shows,
sometimes even matching that
information with the places people
visit and the movies they see. The
information can then be used to
target ads more precisely and to try
to analyze things like which ads
prompted a person to go to a car
dealership.
This feature is based on NNT
73. 4. Clustering
Clustering is the process of grouping
observations of similar kinds into smaller groups
within the larger population.
Clusters are homogeneous within and
heterogeneous across, based on given
characteristics.
74. Example - Sales data for 9 stores
Store Electric Kettle Refrigerators
S1 35 12
S2 42 15
S3 40 10
S4 10 20
S5 14 18
S6 13 19
S7 4 25
S8 8 30
S9 6 27
76. Characteristics of Clustering
• No dependent variable; no causal relationships
• Popularly called subjective segmentation
• Undirected / unsupervised data mining technique
77. Characteristics of Clustering
• It relies on geometric interpretation of data as points in space
• Based upon distance and similarity concept
• Clustering can be a preliminary step for data mining.
78. Data preparation for clustering
• Data preparation issues same as those for memory based reasoning because both rely on the
concept of distance.
• Scaling and weighting is important
• Scaling adjusts the value of variables to take into account the fact that different
variables are measured in different units or over different ranges.
• Weighting provides a relative adjustment for a variable because some variables are more
important than others
79. The K-Means clustering algorithm
• One of the most commonly used clustering algorithms.
• “K” refers to the fact that the algorithm looks for a fixed number of
clusters
• K is specified by the user
• Each record is considered a point in a scatter plot, which implies that
all the input variables are numeric
• The data can be pictured as clouds on the scatter plot
80. Steps in K-means clustering Algorithm
• K-means algorithm starts with an initial guess and uses a series of steps to improve upon
it.
• Initial cluster seeds are chosen randomly
• Each record is ASSIGNED to the cluster defined by its nearest cluster seed
• Next the algorithm identifies the Centroid for each cluster. A Centroid is the average
position of cluster members in each dimension.
• The data points are now UPDATED to new clusters based on their distance from the
centroid
• Algorithm alternates between two steps
• Assignment step
• Update step
81. Steps in K-means clustering Algorithm
• When the assignment step changes the cluster membership, it causes the
update step to be executed again. Successive iteration is done till no new
clusters emerge
• The algorithm continues alternating between update and assignment until
no new assignment is made
• A diagrammatic representation of the K-means clustering is shown in the
following slides
82. K – Means Clustering Algorithm – First
step – random cluster seeds are chosen
83. K – Means Clustering Algorithm –
Assignment step
84. K – Means Clustering Algorithm –
Calculating Centroid
86. Hard and soft clusters
• A cluster model can make “hard” or “soft” cluster
assignments
• Hard clustering assigns each record to a single cluster
• Soft clustering associates each record with several clusters
with varying degree of strength.
87. Evaluating clusters
• Good cluster centre should be the densest part of the data cloud.
• The best assignment of cluster centres could be defined as the one that minimizes
the sum of distance from every data point to its nearest cluster centre.
• Clusters should have members with high degree of similarity (Close to each other)
• Clusters themselves should be widely spaced
• Clusters should have roughly the same number of members (except for detecting
fraud and anomalies)
88. Some basic features about clustering
• There may not be any prior knowledge concerning the clusters
• Cluster results are dynamic as the cluster membership may change over
time
• There is not any one correct answer to clustering problem
• Difficult to handle outliers in Clustering – They can be put in solitary
clusters or they will be forced to be placed in the nearest cluster
89. Clustering: USES
• Grouping customers for targeting offers
• Grouping customers for cross selling and upselling
• Grouping customers for customizing
• Categorizing products for mark down (In the first scatter plot electric kettles
can be discounted in certain stores and refrigerators can be marked down in
certain others)
• Calculating footfall to branches of a store based on their location
90. Voronoi Diagram
• Based on the concept of clustering
• Each region on a Voronoi diagram consists of all the points closest to one of
the cluster centres.
• It is a diagram whose lines mark the points that are equidistant from the two
nearest cluster seeds.
• Voronoi diagram facilitate nearest location queries such as the location of the
closest metro station, cell phone tower, etc to a given address.
• A voronoi representation can be used to find the catchment zone for outlets of
a chain store. It can be used to understand the number of students who will be
enrolling in the nearest primary school, the population who will be visiting the
closest healthcare centre etc.
91. Example
There are 19 pizza outlets in Mumbai belonging to the same company selling
the same products and have same pricing and quality service. We want to
estimate the customer footfall for a particular store in one region of Mumbai
We assume that the customer will visit the store based on the closest distance in
this case. A Voronoi diagram will define the catchment area for each of the
stores which will give us a rough estimate of the customer footfall for each of
the stores