SlideShare a Scribd company logo
1 of 20
Essentials of Business Analytics
Robert Clark
Lawrence Chilton (Mentor)
Brigham Young University – Idaho
Spring 2016
Essentials of Business Analytics
 What is Business Analytics?
 Part 1 – Descriptive Data Mining Through Cluster Analysis
 Part 2 - Predictive Data Mining Through Classification
 Part 3 – Linear Optimization Models
What is Business Analytics?
 Business analytics is the analyzing of data in order to drive
business decisions.
http://www.thehansindia.com/posts/index/Young-Hans/2016-03-09/Business-Analytics-course-at-IIT-H-/212475
Descriptive Data Mining Through Cluster
Analysis
 Goal: segment observations into similar groups.
 Two common methods: Hierarchical and k-Means
https://www.quora.com/Does-measuring-clustering-efficiency-with-precision-and-
recall-make-any-sense
Part 1
Descriptive Data Mining Through Cluster
Analysis
 Here is the data we are clustering:
Part 1
Descriptive Data Mining Through Cluster
Analysis
 Method 1 – Hierarchical Clustering
Part 1
Descriptive Data Mining Through Cluster
Analysis
 Method 2 – k-Means Clustering
Part 1
Descriptive Data Mining Through Cluster
Analysis
k = 2 k = 3
k = 4 k = 5
Part 1
Predictive Data Mining Through
Classification
 Goal: classify a new observation based on current data
 Three common methods: Logistic Regression, k-NN, CART
Part 2
Predictive Data Mining Through
Classification
 Method 1 – Logistic Regression
Part 2
Cutoff Value
Predictive Data Mining Through
Classification
 Method 2 – k-Nearest Neighbors
Part 2
Predictive Data Mining Through
Classification
 Method 3 – Classification and Regression Trees
Part 2
Predictive Data Mining Through
Classification
 Which method is best at classifying new observations?
Part 2
Linear Optimization Models
 Goal: Maximize or minimize the objective function
https://en.wikipedia.org/wiki/Linear_programming
Part 3
Linear Optimization Models
Part 3
 Par, Inc. wants to make standard and deluxe golf bags. They
are constrained by a limited amount of time for each
production step.
 They make $10 profit for standard bags, $9 profit for deluxe
bags
 Objective function to maximize: 10S + 9D
Linear Optimization Models
Feasible Region
Part 3
Feasible Region
Constraint Functions
Linear Optimization Models
Feasible Region
Part 3
Objective Function
Optimized solution
Feasible Region
Linear Optimization Models
 J.D. Williams Inc. Case Study:
 3 funds
 Growth stock fund (18% yield, .10 risk)
 Income fund (12.5% yield, .07 risk)
 Money market fund (7.5% yield, .01 risk)
 Client has $800,000 to invest. How should the client allocate
their money at a controlled risk level while maximizing profit?
Part 3
Linear Optimization Models
 Growth stock fund: $320,000
 Income fund: $240,000
 Money market fund: $240,000
 Yearly return: $105,600
Part 3

More Related Content

Similar to Senior Project Powerpoint

Optimum Investment Selection process-Nov 9-2013
Optimum Investment Selection process-Nov 9-2013Optimum Investment Selection process-Nov 9-2013
Optimum Investment Selection process-Nov 9-2013
Gary Crosbie
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
butest
 
To prepare for this Assignment· Review this week’s Learning Res.docx
To prepare for this Assignment· Review this week’s Learning Res.docxTo prepare for this Assignment· Review this week’s Learning Res.docx
To prepare for this Assignment· Review this week’s Learning Res.docx
juliennehar
 
Description Marks out of Wtg() Word Count Due d.docx
Description Marks out of Wtg() Word Count Due d.docxDescription Marks out of Wtg() Word Count Due d.docx
Description Marks out of Wtg() Word Count Due d.docx
theodorelove43763
 

Similar to Senior Project Powerpoint (20)

Optimum Investment Selection process-Nov 9-2013
Optimum Investment Selection process-Nov 9-2013Optimum Investment Selection process-Nov 9-2013
Optimum Investment Selection process-Nov 9-2013
 
IRJET- Finding Optimal Skyline Product Combinations Under Price Promotion
IRJET- Finding Optimal Skyline Product Combinations Under Price PromotionIRJET- Finding Optimal Skyline Product Combinations Under Price Promotion
IRJET- Finding Optimal Skyline Product Combinations Under Price Promotion
 
6asso
6asso6asso
6asso
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
 
To prepare for this Assignment· Review this week’s Learning Res.docx
To prepare for this Assignment· Review this week’s Learning Res.docxTo prepare for this Assignment· Review this week’s Learning Res.docx
To prepare for this Assignment· Review this week’s Learning Res.docx
 
Optimized Feature Extraction and Actionable Knowledge Discovery for Customer ...
Optimized Feature Extraction and Actionable Knowledge Discovery for Customer ...Optimized Feature Extraction and Actionable Knowledge Discovery for Customer ...
Optimized Feature Extraction and Actionable Knowledge Discovery for Customer ...
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
Data Severance Using Machine Learning for Marketing Strategies
Data Severance Using Machine Learning for Marketing StrategiesData Severance Using Machine Learning for Marketing Strategies
Data Severance Using Machine Learning for Marketing Strategies
 
Bank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim PredictionBank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim Prediction
 
IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm
IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner AlgorithmIRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm
IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Description Marks out of Wtg() Word Count Due d.docx
Description Marks out of Wtg() Word Count Due d.docxDescription Marks out of Wtg() Word Count Due d.docx
Description Marks out of Wtg() Word Count Due d.docx
 
A presentation for Retail Sales Projects
A presentation for Retail Sales ProjectsA presentation for Retail Sales Projects
A presentation for Retail Sales Projects
 
Interface Between Six Sigma and Knowledge Management
Interface Between Six Sigma and Knowledge ManagementInterface Between Six Sigma and Knowledge Management
Interface Between Six Sigma and Knowledge Management
 
Developing a Multiple-Criteria Decision Methodology for the Make-or-Buy Problem
Developing a Multiple-Criteria Decision Methodology for the Make-or-Buy ProblemDeveloping a Multiple-Criteria Decision Methodology for the Make-or-Buy Problem
Developing a Multiple-Criteria Decision Methodology for the Make-or-Buy Problem
 
從數據處理到資料視覺化-商業智慧的實作與應用
從數據處理到資料視覺化-商業智慧的實作與應用從數據處理到資料視覺化-商業智慧的實作與應用
從數據處理到資料視覺化-商業智慧的實作與應用
 
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
Big data
Big dataBig data
Big data
 

Senior Project Powerpoint

  • 1. Essentials of Business Analytics Robert Clark Lawrence Chilton (Mentor) Brigham Young University – Idaho Spring 2016
  • 2. Essentials of Business Analytics  What is Business Analytics?  Part 1 – Descriptive Data Mining Through Cluster Analysis  Part 2 - Predictive Data Mining Through Classification  Part 3 – Linear Optimization Models
  • 3. What is Business Analytics?  Business analytics is the analyzing of data in order to drive business decisions. http://www.thehansindia.com/posts/index/Young-Hans/2016-03-09/Business-Analytics-course-at-IIT-H-/212475
  • 4. Descriptive Data Mining Through Cluster Analysis  Goal: segment observations into similar groups.  Two common methods: Hierarchical and k-Means https://www.quora.com/Does-measuring-clustering-efficiency-with-precision-and- recall-make-any-sense Part 1
  • 5. Descriptive Data Mining Through Cluster Analysis  Here is the data we are clustering: Part 1
  • 6. Descriptive Data Mining Through Cluster Analysis  Method 1 – Hierarchical Clustering Part 1
  • 7. Descriptive Data Mining Through Cluster Analysis  Method 2 – k-Means Clustering Part 1
  • 8. Descriptive Data Mining Through Cluster Analysis k = 2 k = 3 k = 4 k = 5 Part 1
  • 9. Predictive Data Mining Through Classification  Goal: classify a new observation based on current data  Three common methods: Logistic Regression, k-NN, CART Part 2
  • 10. Predictive Data Mining Through Classification  Method 1 – Logistic Regression Part 2 Cutoff Value
  • 11. Predictive Data Mining Through Classification  Method 2 – k-Nearest Neighbors Part 2
  • 12. Predictive Data Mining Through Classification  Method 3 – Classification and Regression Trees Part 2
  • 13. Predictive Data Mining Through Classification  Which method is best at classifying new observations? Part 2
  • 14.
  • 15. Linear Optimization Models  Goal: Maximize or minimize the objective function https://en.wikipedia.org/wiki/Linear_programming Part 3
  • 16. Linear Optimization Models Part 3  Par, Inc. wants to make standard and deluxe golf bags. They are constrained by a limited amount of time for each production step.  They make $10 profit for standard bags, $9 profit for deluxe bags  Objective function to maximize: 10S + 9D
  • 17. Linear Optimization Models Feasible Region Part 3 Feasible Region Constraint Functions
  • 18. Linear Optimization Models Feasible Region Part 3 Objective Function Optimized solution Feasible Region
  • 19. Linear Optimization Models  J.D. Williams Inc. Case Study:  3 funds  Growth stock fund (18% yield, .10 risk)  Income fund (12.5% yield, .07 risk)  Money market fund (7.5% yield, .01 risk)  Client has $800,000 to invest. How should the client allocate their money at a controlled risk level while maximizing profit? Part 3
  • 20. Linear Optimization Models  Growth stock fund: $320,000  Income fund: $240,000  Money market fund: $240,000  Yearly return: $105,600 Part 3

Editor's Notes

  1. Simple: The goal of business analytics is to analyze all the data collected in order to drive business decisions, leading to better performance. This can be done by reducing costs, better marketing strategies, even predicting future events. The three methods that I mentioned, clustering, classification and optimization, are 3 common methods that are not often taught in introductory business analytics courses. Detailed: What to do with missing data: leave it out, fill it in with average data. Depends on why the data is missing. Is it completely random, or not? Other methods include linear regression, statistical inference, time series analysis and forecasting, integer and nonlinear optimization models, etc.
  2. Simple: Clustering can be used on people for marketing strategies, buildings for city planning, geographical locations for earth observation studies, etc. Two common ways to do clustering: hierarchical clustering and k-means clustering. Here the clustering is very obvious, but most often the data is hard to visualize, because the data is very mixed, and often has more than two dimensions. Detailed: Clustering is commonly used in marketing to divide consumers into different groups, a process known as market segmentation. Once divided into groups, a firm can then tailor marketing strategies for the different groups. Clustering can be used to group and compare many things, such as people, buildings, genes, stars and geographical locations. People for marketing strategies, or buying patterns, even for banks and insurance companies. Buildings for city planning. Genes to discover genes of similar functions. Stars to compare similar celestial bodies in order to discover possible planets with life. Geographical locations for earth observation studies.
  3. This data comes from ‘Know Thy Customer’, a financial advising company that provides personalized financial advice to its clients. If KTC is able to cluster their clients, they could tailor specific financial advice to each cluster, making their work easier. Note that this data is 7 dimensions with both quantitative and categorical variables.
  4. Simple: Hierarchical takes each individual observations and groups similar ones, and continues to group similar clusters until one cluster remains. The user decides when to stop the clustering. A horizontal line is drawn on this graph in order visualize where the clustering could end. This would produce 3 clusters. Detailed: Most common measure of similarity is Euclidean distance. It doesn’t work well for categorical variables, though. If clustering just cat. variables, using the matching coefficient is better. This is done by counting the number of class 1’s and 0’s and dividing by n. As more and more clusters are joined, the dissimilarity between observations in each cluster increases. The distance on the Y-axis is the measure of dissimilarity. The number of vertical lines the horizontal line crosses is the number of clusters, with each cluster then defined as to what’s below it. In this case, there would be four clusters. Hierarchical clustering can be sensitive to outliers, added and removed observations. Better with smaller data sets. (<500) Good because you can see solutions with increasing numbers of clusters.
  5. Simple: The user first specifies how many clusters the computer should make. The computer then assigns each observation into a random cluster, and the centroid of each cluster is calculated. Observations are then reassigned to the cluster with the closest centroid. This iteration is repeated until no changes are observed, or the set iteration limit is reached. K-means works well on quantitative variables. Detailed: It is good practice to convert each observation into a z-score, so that the similarity is not dominated by one variable. If not converted to a z-score, the similarity between observations here would be dominated by the income variable, because the values are so much bigger than age. Limitations on k-means is that it is not very good with categorical variables. It is good with large data sets. K-means is more visual. We can see the three clusters are 1. younger with lower income, 2. older with lower income, and 3. older with higher income. The big dots are the centroids of each cluster. The centroid is the average values of each cluster.
  6. This slide shows how the algorithm continues to make new clusters as we increase the number of clusters the computer should make.
  7. Simple: Classification is a form of predictive data mining, which means we are making models that try to predict an observation’s outcome. There are three common methods for classification: Logistic regression, k-nearest neighbors and classification and regression trees. Detailed: Examples are; a bank is deciding if it should give a loan out to a customer by trying to predict whether they would default on the loan or not, or predicting expenditures of potential customers. The success of the model is determined by the classification error. Class 1 error is the percent of true class 1 observations that were predicted to be class 0. Class 0 error is the percent of true class 0 observations that were predicted to be class 1. The overall error rate is the overall percent it incorrectly classified observations. When classifying, a data set is divided up into a training set and a testing set. The training set is used to build the model, and the testing set is used to test the model’s effectiveness.
  8. Simple: Logistic regression attempts to classify a categorical outcome as a function of explanatory variables. In this example, it is classifying a movie as to whether it will win the Oscar for best picture or not, based on the number of Oscar nominations. Whether it classifies a movie as winning the best picture Oscar or not depends on the cutoff value, which we define. Detailed: The value the logistic regression gives is the probability of winning the best picture Oscar. If the calculated probability is over the set cutoff value, (50%), then it is classified as class 1. If not, class 0. In this example, you can see that with 11 nominations and above, a movie would be classified as winning best picture. First, the odds of something happening is calculated, (p/(1-p)), by a linear combination of explanatory variables, then the natural log is taken of the odds, then solving for p. The final equation is p-hat = 1 / (1 + exp(-(linear equation))).
  9. Simple: With k-nearest neighbors, a new observation is compared to its nearest neighbors. The number of neighbors it compares itself to is specified by the user. The observation is classified by which category most of the neighbors belongs to. Detailed: If the percent of neighbors that are class 1 is over the cutoff value, then the new observation is classified as class 1. Otherwise, it will be class 0. The optimal number of k can be found by building models over a typical k value (1, … , 20), and choosing the model with the lowest classification error. This can be used on categorical and continuous outcomes. When estimating continuous outcomes with k-NN, a new observation’s outcome is determined by the average of its nearest neighbors.
  10. Simple: CART works by partitioning the data into increasingly smaller and more homogenous groups. The user specifies the least amount of observations a cluster must have before the method considers dividing it. Detailed: The measure of heterogeneity in a group of observations’ outcome classes or outcome values is the impurity. With classification trees, the impurity is based on the proportion of incorrectly classified observations. If all observations in a group are in the same class, there is zero impurity. The impurity in a regression tree is based on the variance of the outcome value for the observation in the group. Once the tree is constructed, the estimated outcome value of an observation is based on the mean outcome value of the partition in which the new observation belongs. Like clustering, the different methods here can be combined to build more sophisticated models. The optimal number here can again be identified by building multiple models with different division numbers and choosing the one with the lowest error rate.
  11. Simple: We will compare the effectiveness of the 3 methods. This data set is cell phone usage information from customers, and whether or not they cancelled their service. We will try to predict whether or not a current customer will cancel their service (Class 1) or not (Class 0). Detailed: This data is from a cell phone company, who tracked the number of weeks they have had their account, it they had recently renewed a contract or not, if they have a data plan or not, how much monthly data they use, how many customer service calls they’ve made, the average minutes they use per month, the average calls they make in a month, their monthly bill, their largest overage fee in the last 12 months, and their monthly average of roaming minutes. The company wants to predict whether a customer is likely to cancel their service (Class 1) or not (Class 0). I used all three classification methods and compared the classification errors.
  12. Simple: A model’s effectiveness is measured by their error rates. Class 0 error are the percent of people that we predicted would cancel, but they didn’t. Class 1 error is the percent of people that we predicted wouldn’t cancel, but did. CART is the clear winner. Detailed: CART is the clear winner. If we predict that someone will cancel, let’s say we spend $100 dollars advertising on them. Then they don’t, that sets us back $100. If we someone cancels, and we don’t predict that, we could be losing $1,000 or more per customer. So having such a lower class 1 error rate means that it could be saving the company thousands or millions of dollars.
  13. Simple: Optimization is a very widely used method with many applications: telecommunications, manufacturing and transportation, flight crew scheduling, portfolio investments, and marketing techniques. Optimization seeks to maximize or minimize some objective function, like maximizing profits or minimizing costs. Detailed: Optimization can be used for telecommunications, with how to optimize call routing; manufacturing and transportation, with how to distribute goods from producers to distributers; crew scheduling, with assigning crews to flights; portfolios, with distributing investments among investment funds; marketing, with how to distribute funds among various marketing techniques; and the traveling salesmen problem, with what is the shortest route you can take that gets you to every city you need? The main goal of optimization is to maximize or minimize an objective function. The objective function is a business problem that is represented in a mathematical equation. This objective function is restricted by some constraints, like a factory can only produce so many units, or it costs a certain amount of money to travel from one location to the other. This is linear optimization because each equation is a linear equation.
  14. Simple: This is an optimization problem for Par, Inc. who wants to make golf bags. There are 4 steps to the production process, and each one has a limited total number of hours that can be allotted for the step. Detailed: Optimization can be used for telecommunications, with how to optimize call routing; manufacturing and transportation, with how to distribute goods from producers to distributers; crew scheduling, with assigning crews to flights; portfolios, with distributing investments among investment funds; marketing, with how to distribute funds among various marketing techniques; and the traveling salesmen problem, with what is the shortest route you can take that gets you to every city you need? The main goal of optimization is to maximize or minimize an objective function. The objective function is a business problem that is represented in a mathematical equation. This objective function is restricted by some constraints, like a factory can only produce so many units, or it costs a certain amount of money to travel from one location to the other. This is linear optimization because each equation is a linear equation.
  15. Simple: When maximizing an objective function, like a function that calculates profit, there are always constraints that the company is bound to. You can’t have an infinite amount of material or time. These constraints are shown here on the graph as lines. The solutions to the optimization problem are the set of points that satisfy each constraint equation, known as the feasible region. Detailed: The black circles on the constraint lines are maximum solutions, called extreme points. When the constraints are put on the graph and there is no feasible region, then there is no possible solution to the problem. If this were to happen, then you would need to go to management and explain that more resources are needed. If an infinite feasible region is created, then that means you didn’t set up the problem correctly. If a business could maximize their profits infinitely, then that would be a business I would want to work for!
  16. Simple: To maximize the function, The objective function is slid downwards until it intersects with a point on the feasible region. That is the optimized solution. Detailed: It is possible that the objective function is parallel to a side of the feasible region, thus intersecting many points at once. This creates multiple optimal solutions, which is useful for the company.
  17. The risk is a measurement of the potential to lose money on the investment. The higher the percentage, the more risky it is to invest in that fund. The customer further specified that 20% - 40% must go to growth, 20% - 50% must go to income, at least 30% must go to money.
  18. Solution hard to visualize due to having more than two variables. And that is the end of my presentation on business analytics. Are there any questions?