Credit Card Marketing
Classification Trees
From Building Better Models with JMP® Pro,
Chapter 6, SAS Press (2015). Grayson, Gardner
and Stephens.
Used with permission. For additional information,
see community.jmp.com/docs/DOC-7562.
2
Credit Card Marketing
Classification Trees
Key ideas: Classification trees, validation, confusion matrix, misclassification, leaf report, ROC
curves, lift curves.
Background
A bank would like to understand the demographics and other characteristics associated with whether a
customer accepts a credit card offer. Observational data is somewhat limited for this kind of problem, in
that often the company sees only those who respond to an offer. To get around this, the bank designs a
focused marketing study, with 18,000 current bank customers. This focused approach allows the bank to
know who does and does not respond to the offer, and to use existing demographic data that is already
available on each customer.
The designed approach also allows the bank to control for other potentially important factors so that the
offer combination isn’t confused or confounded with the demographic factors. Because of the size of the
data and the possibility that there are complex relationships between the response and the studied
factors, a decision tree is used to find out if there is a smaller subset of factors that may be more
important and that warrant further analysis and study.
The Task
We want to build a model that will provide insight into why some bank customers accept credit card offers.
Because the response is categorical (either Yes or No) and we have a large number of potential predictor
variables, we use the Partition platform to build a classification tree for Offer Accepted. We are primarily
interested in understanding characteristics of customers who have accepted an offer, so the resulting
model will be exploratory in nature.1
The Data Credit Card Marketing BBM.jmp
The data set consists of information on the 18,000 current bank customers in the study.
Customer Number: A sequential number assigned to the customers (this column is hidden and
excluded – this unique identifier will not be used directly).
Offer Accepted: Did the customer accept (Yes) or reject (No) the offer.
Reward: The type of reward program offered for the card.
Mailer Type: Letter or postcard.
Income Level: Low, Medium or High.
# Bank Accounts Open: How many non-credit-card accounts are held by the customer.
1 In exploratory modeling, the goal is to understand the variables or characteristics that drive behaviors or particular outcomes. In
predictive modeling, the goal is to accurately predict new observations and future behaviors, given the current information and
situation.
3
Overdraft Protection: Does the customer have overdraft protection on their checking account(s)
(Yes or No).
Credit Rating: Low, Medium or High.
# Credit Cards Held: The number of cred ...
https://ijitce.com/index.php
Our journal maintains rigorous peer review standards. Each submitted article undergoes a thorough evaluation by experts in the respective field. This stringent review process helps ensure that only high-quality and scientifically sound research is accepted for publication. Researchers can trust that the articles they find in IJITCE have been critically assessed for validity, significance, and originality.
Dive deep into the world of insurance churn prediction with this captivating data analysis project presented by Boston Institute of Analytics. Our talented students embark on a journey to unravel the mysteries behind customer churn in the insurance industry, leveraging advanced data analysis techniques to forecast and anticipate customer behavior. From analyzing historical data and customer demographics to identifying predictive indicators and developing churn prediction models, this project offers a comprehensive exploration of the factors influencing insurance churn dynamics. Gain valuable insights and actionable recommendations derived from rigorous data analysis, presented in an engaging and informative format. Don't miss this opportunity to delve into the fascinating realm of data analysis and unlock new perspectives on insurance churn prediction. Explore the project now and embark on a journey of discovery with Boston Institute of Analytics. To learn more about our data science and artificial intelligence programs, visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
Churn in the Telecommunications Industryskewdlogix
Strategic Business Analysis Capstone Project Telecommunications Churn Management
Churn is a significant problem that costs telecommunications companies billions of dollars through lost revenue. Now that the market is more mature, the only way for a company to grow is to take their competitors customers. This issue
combined with the greater choice that consumers have gained means that any adverse touch point with a consumer can result in a lost customer.
This document provides an overview of machine learning and logistic regression. It discusses key concepts in machine learning like representation, evaluation, and optimization. It also discusses different machine learning algorithms like decision trees, neural networks, and support vector machines. The document then focuses on logistic regression, explaining concepts like maximum likelihood estimation, concordance, and confusion matrices which are used to evaluate logistic regression models. It provides an example of using logistic regression for a banking customer classification problem to predict defaults.
This document outlines a proposal to analyze customer relationship management (CRM) data to predict young female customers' propensity to apply for a debit card. The objective is to test hypotheses about factors that influence application rates. The analysis would involve segmenting customers, predictive modeling using logistic regression, and multivariate testing of marketing campaigns on social media. The expected results are identification of key customer parameters, predictive models to increase conversion rates, and insights to improve targeted advertisements.
This document proposes a system called JediForceX that uses consumer data like finances, health, and preferences to provide analytics and recommendations to empower consumers when choosing food products. It aims to place analytics in the hands of consumers to help them make informed decisions. The system would analyze advertisement attributes and provide rankings and categorizations to help consumers identify options that meet their needs. A 3-phase approach is outlined to first provide reactive analytics and then shift to more proactive and interactive analytics over time to positively influence consumer behaviors around food purchases and finances. The goal is to help consumers meet needs and wants rather than just buy products. Experiments are proposed to test the impact on consumer spending and savings habits.
Pres. Gertjan Kaart Credit Alliance Jan 2011gertjankaart
The document discusses credit scoring models and their quality. It notes that scoring models are tailored based on the available data in different markets and that a blended model using different data sources can improve scores. It also emphasizes that the predictive value of scores is important but other factors like coverage, speed, and understandability also contribute to quality. Customers evaluate scores based on both technical quality and other criteria.
https://ijitce.com/index.php
Our journal maintains rigorous peer review standards. Each submitted article undergoes a thorough evaluation by experts in the respective field. This stringent review process helps ensure that only high-quality and scientifically sound research is accepted for publication. Researchers can trust that the articles they find in IJITCE have been critically assessed for validity, significance, and originality.
Dive deep into the world of insurance churn prediction with this captivating data analysis project presented by Boston Institute of Analytics. Our talented students embark on a journey to unravel the mysteries behind customer churn in the insurance industry, leveraging advanced data analysis techniques to forecast and anticipate customer behavior. From analyzing historical data and customer demographics to identifying predictive indicators and developing churn prediction models, this project offers a comprehensive exploration of the factors influencing insurance churn dynamics. Gain valuable insights and actionable recommendations derived from rigorous data analysis, presented in an engaging and informative format. Don't miss this opportunity to delve into the fascinating realm of data analysis and unlock new perspectives on insurance churn prediction. Explore the project now and embark on a journey of discovery with Boston Institute of Analytics. To learn more about our data science and artificial intelligence programs, visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
Churn in the Telecommunications Industryskewdlogix
Strategic Business Analysis Capstone Project Telecommunications Churn Management
Churn is a significant problem that costs telecommunications companies billions of dollars through lost revenue. Now that the market is more mature, the only way for a company to grow is to take their competitors customers. This issue
combined with the greater choice that consumers have gained means that any adverse touch point with a consumer can result in a lost customer.
This document provides an overview of machine learning and logistic regression. It discusses key concepts in machine learning like representation, evaluation, and optimization. It also discusses different machine learning algorithms like decision trees, neural networks, and support vector machines. The document then focuses on logistic regression, explaining concepts like maximum likelihood estimation, concordance, and confusion matrices which are used to evaluate logistic regression models. It provides an example of using logistic regression for a banking customer classification problem to predict defaults.
This document outlines a proposal to analyze customer relationship management (CRM) data to predict young female customers' propensity to apply for a debit card. The objective is to test hypotheses about factors that influence application rates. The analysis would involve segmenting customers, predictive modeling using logistic regression, and multivariate testing of marketing campaigns on social media. The expected results are identification of key customer parameters, predictive models to increase conversion rates, and insights to improve targeted advertisements.
This document proposes a system called JediForceX that uses consumer data like finances, health, and preferences to provide analytics and recommendations to empower consumers when choosing food products. It aims to place analytics in the hands of consumers to help them make informed decisions. The system would analyze advertisement attributes and provide rankings and categorizations to help consumers identify options that meet their needs. A 3-phase approach is outlined to first provide reactive analytics and then shift to more proactive and interactive analytics over time to positively influence consumer behaviors around food purchases and finances. The goal is to help consumers meet needs and wants rather than just buy products. Experiments are proposed to test the impact on consumer spending and savings habits.
Pres. Gertjan Kaart Credit Alliance Jan 2011gertjankaart
The document discusses credit scoring models and their quality. It notes that scoring models are tailored based on the available data in different markets and that a blended model using different data sources can improve scores. It also emphasizes that the predictive value of scores is important but other factors like coverage, speed, and understandability also contribute to quality. Customers evaluate scores based on both technical quality and other criteria.
This document discusses the rise of predictive analytics and its value in enterprise decision making. It begins by explaining how predictive analytics has expanded from niche uses to a widely adopted competitive technique, fueled by big data, improved analytics tools, and demonstrated successes. A classic example given is credit scoring, which uses predictive models to assess credit risk. The document then provides examples of other areas where predictive models generate value, such as marketing, customer retention, pricing, and fraud prevention. It discusses how effective predictive models are built by using statistical techniques on data that describes predictive factors and outcomes. The document argues that predictive models provide the most value when applied to processes involving large volumes of similar decisions that have significant financial or other impacts, and where relevant electronic
This document provides an overview of marketing analytics and the marketing analytics process. It defines marketing analytics as using data, statistics, mathematics, and technology to solve marketing problems. The document outlines the different levels of analytics from descriptive to predictive to prescriptive. It also describes the 7-step marketing analytics process which includes defining the business problem, data collection and preparation, modeling, evaluation, communication of results, and deployment of the model. The goal is to use analytics to gain insights, make better decisions, and achieve a competitive advantage.
The document discusses consumer credit risk modeling. It covers various statistical and machine learning methods used for credit scoring, including logistic regression, neural networks, and support vector machines (SVM). Logistic regression models the probability of default as a function of input variables and is commonly used. Neural networks can combine and transform input characteristics in non-linear ways but may take longer to train than other methods. The goal is to accurately predict consumer credit risk and default based on application information.
The document discusses credit scoring methods and model development. It provides an overview of different types of scoring models, including application and behavioral scoring. It also describes the model building process, including variable selection, statistical techniques like logistic regression, model validation, and performance measures. Monitoring of models after implementation is discussed through examples like approval rate reports and scorecard performance analysis. Future directions for scoring are mentioned, like adaptive control and profitability modeling.
The document discusses various data mining tasks relevant to customer relationship management (CRM). It describes classification, regression, link analysis, and deviation detection. Classification involves mapping data into predefined classes and is used for credit approvals, fraud detection, and targeting offers. Regression establishes relationships between variables to predict outcomes like sales or churn. Link analysis identifies connections between data items to reveal patterns in areas like referrals, purchases, and websites. Deviation detection finds significant changes from normal values to identify anomalies.
This document presents a study comparing several machine learning models for personal credit scoring: logistic regression, multilayer perceptron, support vector machine, AdaBoostM1, and Hidden Layer Learning Vector Quantization (HLVQ-C). The models were tested on datasets from a Portuguese bank. HLVQ-C achieved the highest accuracy and was the most useful model according to a proposed measure that considers earnings from denying bad credits and losses from denying good credits. While other models had higher error rates for good credits, HLVQ-C balanced accuracy and usefulness the best, making it suitable for commercial credit scoring applications.
Creating an Explainable Machine Learning AlgorithmBill Fite
How to create an explainable scorecard model using machine learning to optimize its performance with results and insights from applying it to a stock picking problem.
The document discusses creating explainable machine learning models and provides examples of widely used statistical models like logistic regression that produce easily understandable linear scores. It then proposes that an explainable machine learning model could combine both the probability of a financial outcome and the expected gain or loss, while maintaining model interpretability through techniques like scorecards with integer weights. The document outlines applying such a model to problems like picking stocks to maximize expected return over time periods while selecting a set number of stocks and considering constraints.
Dart builds sophisticated customer segmentation models using statistical techniques and intuition. The goal is to create distinct customer segments that are predictive of behavior and can be implemented for marketing purposes. Dart analyzes customer, transaction, and demographic data to develop segments. The segmentation process involves data preparation, analysis, model development, and validation of segments. Segments are then profiled and analyzed financially to optimize marketing strategies.
Dart builds sophisticated customer segmentation models using statistical techniques and intuition. The goal is to create distinct customer segments that are predictive of behavior and can be implemented for marketing purposes. Dart analyzes customer, transaction, and demographic data to develop segments. The segmentation process involves data preparation, analysis, model development, and finalizing the segments with descriptive profiles and financial analysis. Segments are monitored over time and recalibrated as needed to keep the segmentation strategy relevant.
The document provides an overview of credit scoring and scorecard development. It discusses:
- The objectives of credit scoring in assessing credit risk and forecasting good/bad applicants.
- The types of clients that are categorized for scoring, including good, bad, indeterminate, insufficient, excluded, and rejected.
- The research objectives and challenges in building statistical models to assign risk scores and monitor model performance.
- The research methodology involving data partitioning, variable binning, scorecard modeling using logistic regression, and scorecard evaluation metrics like KS, Gini, and lift.
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRYIJITCA Journal
Churn customer, one of the most important issues in customer relationship management and marketing is especially in industries such as telecommunications, the financial and insurance. In recent decades much
research has been done in this area. In this research, the index set for the reasons set reason churn customers for our customers is of particular importance. In this study we are intended to provide a formula for the index churn customers, the better to understand the reasons for customers to provide churn. Therefore, in order to evaluate the formula provided through six Classification methods (Decision tree QUEST, Decision tree C5.0, Decision tree CHAID, Decision trees CART, Bayesian network, Neural network) to evaluate the formula will be involved with individual indicators
In this paper I compare a conventional classification regression method with the state of the art machine learning technique XGBoost. This results in a major performance gain in terms of classification and expected loss.
Machine learning project called loan prediction which is implemented using different algorithms that are in machine learning and accuracy in all algorithms has been calculated and the data set has been downloaded from google and the data is splitted for training and testing purpose where 90% for training and 10% for testing the more data is used for training to increase efficiency also the system will give accurate information based on the data that is trained before implementing algorithms the steps like data collection data analyzing and data cleaning has to be performed
A Predictive Analytics Primer.Predictive analytics encompasses a variety of statistical techniques from predictive modelling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.
Accurate Campaign Targeting Using Classification AlgorithmsJieming Wei
This paper aims to build a binary classification model to help non-profit organizations efficiently target likely donors for direct mail campaigns. The authors use a dataset of over 1 million records containing demographic and campaign attributes to select relevant features and split the data into training and test sets. Several classification algorithms are tested on the data, with a neural network found to have the lowest false positive error rate, which is important to minimize costs. The authors further tune the neural network structure and regularization to optimize performance, and select a classification threshold that balances errors to maximize estimated net returns.
The document describes the 8 step data mining process:
1) Defining the problem, 2) Collecting data, 3) Preparing data, 4) Pre-processing, 5) Selecting an algorithm and parameters, 6) Training and testing, 7) Iterating models, 8) Evaluating the final model. It discusses issues like defining classification vs estimation problems, selecting appropriate inputs and outputs, and determining when sufficient data has been collected for modeling.
The document discusses building a machine learning model to predict customer churn for a telecommunications company using a dataset containing customer characteristics. It describes preprocessing the data, exploring the features, training various classification models including logistic regression, support vector machines, random forests and decision trees, and evaluating model performance. Logistic regression achieved the best results with 79% accuracy at predicting whether customers will churn. Future work could include reducing more features and testing additional models to improve accuracy for predicting telecom customer churn.
This document discusses predicting customer churn for a telecommunications company. It begins with an introduction to the problem and dataset, which contains information on 7,043 customers. It then preprocesses the data, which has 19 variables on demographic, account, and service characteristics. Various machine learning algorithms are trained and evaluated on the data, with logistic regression achieving the best accuracy of 79%. The document concludes with opportunities for future improvement and acknowledgments.
1. The document outlines a six-step process for developing scoring models: research design, data checking and variable creation, creating analysis files, calibrating the scoring model, model evaluation, and model implementation.
2. Several modeling techniques are discussed including linear regression, logistic regression, and neural networks. Key factors in choosing a technique include the target variable type and the software environment.
3. Model evaluation is done using lift tables and gains tables to assess how well the model ranks and selects customers. Graphs of these tables help understand model performance in selecting respondents and generating revenue or profit.
Read Chapter 3. Answer the following questions1.Wha.docxShiraPrater50
Read Chapter 3
.
Answer the following questions:
1.
What can give a teacher insight into children’s language behavior?
2.
How many new words might a preschooler acquire each day?
3.
Define
receptive vocabulary and expressive vocabulary.
4.
Compare speech when a child is excited to speech when a child is embarrassed, sad, or shy.
5.
What is the focus of play for very young preschoolers?
6.
Define
regularization.
7.
What is the focus for questions during the toddler period?
8.
Define
overextension.
9.
Describe
running commentaries.
10.
List
eight (8)
possible developmental reasons and benefits of self-talk.
11.
Define
consonant and vowel.
12.
What advice should be given to families and early childhood educators?
13.
List
(four) 4
suggestions for books for younger preschoolers.
14.
List
ten (10)
expectations as preschoolers get older.
15.
Describe friendships of young preschoolers.
16. List
five (5)
areas of growth in children through group play.
17. How do children learn language?
18. Explain
relational words
and why these words are important.
19. Explain
impact words, sound words, created words
and
displaying creativity
.
20. Discuss the danger of assumptions about intelligence through language ability.
21. List
four (4)
speech and language characteristics of older preschoolers.
22. What may depress a child's vocabulary development?
23. Define
metalinguistic awareness.
24. How does physical growth affect children's perceptions of themselves?
25.
Define
mental image.
26.
Define
visual literacy.
27.
Explain the order in which motor skills are developed.
28.
Explain the
Montessori
approach to education for young children.
29. List
seventeen (17) objectives for refining perceptual-motor skills.
30.
Define
assimilation and accommodation.
31. What is a zone of proximal development?
32.
What is the teacher’s role in working with infants, toddlers and preschoolers?
33.
Define
metalinguistic skills.
34.
Define
social connectedness.
35. List
six (6)
social ability goals that serve as a strong foundation for future schooling.
.
Read Chapter 15 and answer the following questions 1. De.docxShiraPrater50
Read Chapter 15 and answer the following questions
:
1. Describe several characteristics of infants that make them different from other children.
2. What is the feeding challenge in meeting the nutritional needs of an infant?
3. Define
low-birthweight (LBW) infant
.
4. List
nine (9)
problems associated with low birth weight.
5. List
five (5)
reasons a mother may choose formula feeding instead of breast feeding.
6. List
four (4)
steps to safe handling of breast milk.
7. What
two (2)
factors determine safe preparation of formula? Briefly describe each factor.
8. Define
aseptic procedure.
9. Define
distention
and tell what causes distention.
10. Define
regurgitation, electrolytes,
and
developmental or physiological readiness.
11. Why should a bottle
NEVER
be propped and a baby left unattended while feeding?
12. When might an infant need supplemental water?
13. When should solid food be introduced to an infant? What is meant by the infant being developmentally ready?
14. Define
palmar grasp
and
pincer grip.
15. List
ten (10)
common feeding concerns. Pick
ONE
and explain why that is a concern.
Read Chapter 16 and answer the following questions:
1. Describe
toddlers and preschoolers
.
2. Define
neophobic.
3. List
three (3)
things a teacher is responsible for when feeding a toddler. List
two (2)
things for which the child is responsible.
4. Why should you
NOT
try to force a toddler to eat or be overly concerned if children are suddenly eating less?
5. Explain the results of spacing meals
too far apart
and
too close together
.
6. List a
good eating pattern
for toddlers.
7. Name several healthy snack choices for toddlers and young children.
8. List several suggestions for making eating time comfortable, pleasant and safe.
9. What changes about eating habits when a toddler develops into a preschooler?
10. Define
Down syndrome
and
Prader-Willi syndrome.
11. How can parents and teachers promote good eating habits for preschoolers?
12. When and where should rewards be offered?
13. Why should children
not
be encouraged to have a
“clean plate”?
14. List
five (5)
health conditions related to dietary patterns.
15. What is the Physical Activity Pyramid and for what is it designed?
16. List
eight (8)
common feeding concerns during toddler and preschool years. Pick
one and explain
it thoroughly.
https://books.google.com/books/about/Health_Safety_and_Nutrition_for_the_Youn.html?id=7zcaCgAAQBAJ&printsec=frontcover&source=kp_read_button#v=onepage&q&f=false
.
More Related Content
Similar to Credit Card Marketing Classification Trees Fr.docx
This document discusses the rise of predictive analytics and its value in enterprise decision making. It begins by explaining how predictive analytics has expanded from niche uses to a widely adopted competitive technique, fueled by big data, improved analytics tools, and demonstrated successes. A classic example given is credit scoring, which uses predictive models to assess credit risk. The document then provides examples of other areas where predictive models generate value, such as marketing, customer retention, pricing, and fraud prevention. It discusses how effective predictive models are built by using statistical techniques on data that describes predictive factors and outcomes. The document argues that predictive models provide the most value when applied to processes involving large volumes of similar decisions that have significant financial or other impacts, and where relevant electronic
This document provides an overview of marketing analytics and the marketing analytics process. It defines marketing analytics as using data, statistics, mathematics, and technology to solve marketing problems. The document outlines the different levels of analytics from descriptive to predictive to prescriptive. It also describes the 7-step marketing analytics process which includes defining the business problem, data collection and preparation, modeling, evaluation, communication of results, and deployment of the model. The goal is to use analytics to gain insights, make better decisions, and achieve a competitive advantage.
The document discusses consumer credit risk modeling. It covers various statistical and machine learning methods used for credit scoring, including logistic regression, neural networks, and support vector machines (SVM). Logistic regression models the probability of default as a function of input variables and is commonly used. Neural networks can combine and transform input characteristics in non-linear ways but may take longer to train than other methods. The goal is to accurately predict consumer credit risk and default based on application information.
The document discusses credit scoring methods and model development. It provides an overview of different types of scoring models, including application and behavioral scoring. It also describes the model building process, including variable selection, statistical techniques like logistic regression, model validation, and performance measures. Monitoring of models after implementation is discussed through examples like approval rate reports and scorecard performance analysis. Future directions for scoring are mentioned, like adaptive control and profitability modeling.
The document discusses various data mining tasks relevant to customer relationship management (CRM). It describes classification, regression, link analysis, and deviation detection. Classification involves mapping data into predefined classes and is used for credit approvals, fraud detection, and targeting offers. Regression establishes relationships between variables to predict outcomes like sales or churn. Link analysis identifies connections between data items to reveal patterns in areas like referrals, purchases, and websites. Deviation detection finds significant changes from normal values to identify anomalies.
This document presents a study comparing several machine learning models for personal credit scoring: logistic regression, multilayer perceptron, support vector machine, AdaBoostM1, and Hidden Layer Learning Vector Quantization (HLVQ-C). The models were tested on datasets from a Portuguese bank. HLVQ-C achieved the highest accuracy and was the most useful model according to a proposed measure that considers earnings from denying bad credits and losses from denying good credits. While other models had higher error rates for good credits, HLVQ-C balanced accuracy and usefulness the best, making it suitable for commercial credit scoring applications.
Creating an Explainable Machine Learning AlgorithmBill Fite
How to create an explainable scorecard model using machine learning to optimize its performance with results and insights from applying it to a stock picking problem.
The document discusses creating explainable machine learning models and provides examples of widely used statistical models like logistic regression that produce easily understandable linear scores. It then proposes that an explainable machine learning model could combine both the probability of a financial outcome and the expected gain or loss, while maintaining model interpretability through techniques like scorecards with integer weights. The document outlines applying such a model to problems like picking stocks to maximize expected return over time periods while selecting a set number of stocks and considering constraints.
Dart builds sophisticated customer segmentation models using statistical techniques and intuition. The goal is to create distinct customer segments that are predictive of behavior and can be implemented for marketing purposes. Dart analyzes customer, transaction, and demographic data to develop segments. The segmentation process involves data preparation, analysis, model development, and validation of segments. Segments are then profiled and analyzed financially to optimize marketing strategies.
Dart builds sophisticated customer segmentation models using statistical techniques and intuition. The goal is to create distinct customer segments that are predictive of behavior and can be implemented for marketing purposes. Dart analyzes customer, transaction, and demographic data to develop segments. The segmentation process involves data preparation, analysis, model development, and finalizing the segments with descriptive profiles and financial analysis. Segments are monitored over time and recalibrated as needed to keep the segmentation strategy relevant.
The document provides an overview of credit scoring and scorecard development. It discusses:
- The objectives of credit scoring in assessing credit risk and forecasting good/bad applicants.
- The types of clients that are categorized for scoring, including good, bad, indeterminate, insufficient, excluded, and rejected.
- The research objectives and challenges in building statistical models to assign risk scores and monitor model performance.
- The research methodology involving data partitioning, variable binning, scorecard modeling using logistic regression, and scorecard evaluation metrics like KS, Gini, and lift.
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRYIJITCA Journal
Churn customer, one of the most important issues in customer relationship management and marketing is especially in industries such as telecommunications, the financial and insurance. In recent decades much
research has been done in this area. In this research, the index set for the reasons set reason churn customers for our customers is of particular importance. In this study we are intended to provide a formula for the index churn customers, the better to understand the reasons for customers to provide churn. Therefore, in order to evaluate the formula provided through six Classification methods (Decision tree QUEST, Decision tree C5.0, Decision tree CHAID, Decision trees CART, Bayesian network, Neural network) to evaluate the formula will be involved with individual indicators
In this paper I compare a conventional classification regression method with the state of the art machine learning technique XGBoost. This results in a major performance gain in terms of classification and expected loss.
Machine learning project called loan prediction which is implemented using different algorithms that are in machine learning and accuracy in all algorithms has been calculated and the data set has been downloaded from google and the data is splitted for training and testing purpose where 90% for training and 10% for testing the more data is used for training to increase efficiency also the system will give accurate information based on the data that is trained before implementing algorithms the steps like data collection data analyzing and data cleaning has to be performed
A Predictive Analytics Primer.Predictive analytics encompasses a variety of statistical techniques from predictive modelling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.
Accurate Campaign Targeting Using Classification AlgorithmsJieming Wei
This paper aims to build a binary classification model to help non-profit organizations efficiently target likely donors for direct mail campaigns. The authors use a dataset of over 1 million records containing demographic and campaign attributes to select relevant features and split the data into training and test sets. Several classification algorithms are tested on the data, with a neural network found to have the lowest false positive error rate, which is important to minimize costs. The authors further tune the neural network structure and regularization to optimize performance, and select a classification threshold that balances errors to maximize estimated net returns.
The document describes the 8 step data mining process:
1) Defining the problem, 2) Collecting data, 3) Preparing data, 4) Pre-processing, 5) Selecting an algorithm and parameters, 6) Training and testing, 7) Iterating models, 8) Evaluating the final model. It discusses issues like defining classification vs estimation problems, selecting appropriate inputs and outputs, and determining when sufficient data has been collected for modeling.
The document discusses building a machine learning model to predict customer churn for a telecommunications company using a dataset containing customer characteristics. It describes preprocessing the data, exploring the features, training various classification models including logistic regression, support vector machines, random forests and decision trees, and evaluating model performance. Logistic regression achieved the best results with 79% accuracy at predicting whether customers will churn. Future work could include reducing more features and testing additional models to improve accuracy for predicting telecom customer churn.
This document discusses predicting customer churn for a telecommunications company. It begins with an introduction to the problem and dataset, which contains information on 7,043 customers. It then preprocesses the data, which has 19 variables on demographic, account, and service characteristics. Various machine learning algorithms are trained and evaluated on the data, with logistic regression achieving the best accuracy of 79%. The document concludes with opportunities for future improvement and acknowledgments.
1. The document outlines a six-step process for developing scoring models: research design, data checking and variable creation, creating analysis files, calibrating the scoring model, model evaluation, and model implementation.
2. Several modeling techniques are discussed including linear regression, logistic regression, and neural networks. Key factors in choosing a technique include the target variable type and the software environment.
3. Model evaluation is done using lift tables and gains tables to assess how well the model ranks and selects customers. Graphs of these tables help understand model performance in selecting respondents and generating revenue or profit.
Similar to Credit Card Marketing Classification Trees Fr.docx (20)
Read Chapter 3. Answer the following questions1.Wha.docxShiraPrater50
Read Chapter 3
.
Answer the following questions:
1.
What can give a teacher insight into children’s language behavior?
2.
How many new words might a preschooler acquire each day?
3.
Define
receptive vocabulary and expressive vocabulary.
4.
Compare speech when a child is excited to speech when a child is embarrassed, sad, or shy.
5.
What is the focus of play for very young preschoolers?
6.
Define
regularization.
7.
What is the focus for questions during the toddler period?
8.
Define
overextension.
9.
Describe
running commentaries.
10.
List
eight (8)
possible developmental reasons and benefits of self-talk.
11.
Define
consonant and vowel.
12.
What advice should be given to families and early childhood educators?
13.
List
(four) 4
suggestions for books for younger preschoolers.
14.
List
ten (10)
expectations as preschoolers get older.
15.
Describe friendships of young preschoolers.
16. List
five (5)
areas of growth in children through group play.
17. How do children learn language?
18. Explain
relational words
and why these words are important.
19. Explain
impact words, sound words, created words
and
displaying creativity
.
20. Discuss the danger of assumptions about intelligence through language ability.
21. List
four (4)
speech and language characteristics of older preschoolers.
22. What may depress a child's vocabulary development?
23. Define
metalinguistic awareness.
24. How does physical growth affect children's perceptions of themselves?
25.
Define
mental image.
26.
Define
visual literacy.
27.
Explain the order in which motor skills are developed.
28.
Explain the
Montessori
approach to education for young children.
29. List
seventeen (17) objectives for refining perceptual-motor skills.
30.
Define
assimilation and accommodation.
31. What is a zone of proximal development?
32.
What is the teacher’s role in working with infants, toddlers and preschoolers?
33.
Define
metalinguistic skills.
34.
Define
social connectedness.
35. List
six (6)
social ability goals that serve as a strong foundation for future schooling.
.
Read Chapter 15 and answer the following questions 1. De.docxShiraPrater50
Read Chapter 15 and answer the following questions
:
1. Describe several characteristics of infants that make them different from other children.
2. What is the feeding challenge in meeting the nutritional needs of an infant?
3. Define
low-birthweight (LBW) infant
.
4. List
nine (9)
problems associated with low birth weight.
5. List
five (5)
reasons a mother may choose formula feeding instead of breast feeding.
6. List
four (4)
steps to safe handling of breast milk.
7. What
two (2)
factors determine safe preparation of formula? Briefly describe each factor.
8. Define
aseptic procedure.
9. Define
distention
and tell what causes distention.
10. Define
regurgitation, electrolytes,
and
developmental or physiological readiness.
11. Why should a bottle
NEVER
be propped and a baby left unattended while feeding?
12. When might an infant need supplemental water?
13. When should solid food be introduced to an infant? What is meant by the infant being developmentally ready?
14. Define
palmar grasp
and
pincer grip.
15. List
ten (10)
common feeding concerns. Pick
ONE
and explain why that is a concern.
Read Chapter 16 and answer the following questions:
1. Describe
toddlers and preschoolers
.
2. Define
neophobic.
3. List
three (3)
things a teacher is responsible for when feeding a toddler. List
two (2)
things for which the child is responsible.
4. Why should you
NOT
try to force a toddler to eat or be overly concerned if children are suddenly eating less?
5. Explain the results of spacing meals
too far apart
and
too close together
.
6. List a
good eating pattern
for toddlers.
7. Name several healthy snack choices for toddlers and young children.
8. List several suggestions for making eating time comfortable, pleasant and safe.
9. What changes about eating habits when a toddler develops into a preschooler?
10. Define
Down syndrome
and
Prader-Willi syndrome.
11. How can parents and teachers promote good eating habits for preschoolers?
12. When and where should rewards be offered?
13. Why should children
not
be encouraged to have a
“clean plate”?
14. List
five (5)
health conditions related to dietary patterns.
15. What is the Physical Activity Pyramid and for what is it designed?
16. List
eight (8)
common feeding concerns during toddler and preschool years. Pick
one and explain
it thoroughly.
https://books.google.com/books/about/Health_Safety_and_Nutrition_for_the_Youn.html?id=7zcaCgAAQBAJ&printsec=frontcover&source=kp_read_button#v=onepage&q&f=false
.
Read Chapter 2 and answer the following questions1. List .docxShiraPrater50
Read Chapter 2 and answer the following questions:
1. List
five (5)
decisions a teacher must make about the curriculum.
2. List
three (3)
ways that all children are alike.
3. List
three (3)
similar needs of young children.
4. Describe the change in thought from age 2 through age 11 or 12.
5. List
four (4)
ways teachers can determine children’s background experiences.
6. List
three (3)
ways to find out children’s interests.
7. List
four (4)
ways to determine the developmental levels and abilities of children.
8. What is P.L. 94-142 and what does it state?
9. List
four (4)
things you need to do as a teacher of special children regarding P.L. 94-142.
10. List
eight (8)
categories of special needs children.
11. List the
eleven (11)
goals of an inclusion program.
12.
List
and
explain three (3)
methods to gain knowledge about the culture and values of a community.
13. Why must teachers of young children understand geography, history, economics and other social sciences?
14. List
six (6)
ways children can assist with planning.
15. List
five (5)
elements that should be included in lessons plans.
16. List
four (4)
main sections that every lesson plan should include regardless of format.
17. Define
behavioral objective.
What
three (3)
questions do behavioral objectives answer?
18. What are
four (4)
goals which can be accomplished through the use of units, projects, and thematic learning?
19. List
three (3)
considerations for selecting themes or topics.
20. After selecting a theme or topic, list
seven (7)
elements that should be included in planning for the theme or unit.
21. List
five (5)
uses for authentic assessment
.
22.
List
and
describe
four (4)
types of assessments.
23. List
five (5)
things you should look for when interviewing children.
24. What are
rubrics
, and how can rubrics be used?
25. What are standardized tests and why might they
not
be useful to teachers of young children?
book
Social Studies for the Preschool/Primary Child
Carol Seefeldt; Sharon D. Castle; Renee Falconer
also you may used any addition
.
Read chapter 7 and write the book report The paper should be .docxShiraPrater50
Read chapter 7 and write the book report
The paper should be single-spaced, 2-page (excluding cover page and references) long, and typed in Times New Roman 12 points. The paper should have a title, and consists of at least two sections: 1) A brief narrative of how an IS/IT is realized, initiated, designed, and implemented in terms of what/when/where/how this happened, and key character players involved in the series of events.
.
Read Chapter 7 and answer the following questions1. What a.docxShiraPrater50
Read Chapter 7 and answer the following questions:
1. What are preschoolers like?
2. Define
large motor, coordination, agility
and
conscience
.
3. What do preschoolers do?
4. What do preschoolers need?
5. Define
sense of initiative, socialized
and
norms
.
6. List the
seven (7)
dimensions of an environment advocated by Prescott.
7. Describe an environment that provides for initiative.
8. List
six (6)
opportunities for children provided through good storage of materials.
9. Define
pictograph
.
10. List
six (6)
environments that foster initiative
.
11. Describe an environment that helps to develop creativity.
12. List
eight (8)
factors for creativity.
13. Describe an environment for learning through play.
14. Where do you begin when deciding how to set up a room?
15. What should you know about pathways in the room?
16. How can you modify a classroom for children with special needs?
17. List
seven (7)
suggestions for welcoming children with special needs.
18. Describe an environment for outdoor play.
19. List
seven (7)
suggestions for an environment that fosters play.
20. How can you plan for safety?
21. Define
interest centers, indirect guidance, private space
and
antibiased
.
22. Describe an environment that fosters self-control.
23. Define
time blocks, child-initiated,
and
teacher-initiated
.
24. List
six (6)
features found in schedules that meet children's needs.
25. List
eight (8)
principles of developmentally appropriate transitions for preschoolers.
26. Define
kindergarten
. Describe kindergarten today.
27. Define
screening, readiness tests, transitional classes
and
retention
.
28. What is the kindergarten dilemma?
29. List
five (5)
inappropriate physical environments for preschoolers.
Read Chapter 8 and answer the following questions:
1. What are primary-age children like?
2. What do primary-age children like to do?
3. Define
peers, sense of industry, competence
and
concrete
.
4. What do primary-age children need?
5. How do primary-age children learn best?
6. What are some of the concerns about public education?
7. Describe an environment for a sense of industry.
8. What is a benefit of the learning-center approach for primary-age children?
9. What is a planning contract?
10. What is an advantage to providing a number of separate learning centers?
11. What is a planning board?
12. Define
portfolio
.
13. How do teachers of primary-age children use portfolios and work samples?
14. What are two large and important learning centers related to literacy?
15. What should a writing center contain?
16. List
four (4)
suggestions for an environment that fosters early literacy.
17. Describe an environment that fosters math understanding.
18. Describe a physical environment that fosters scientific awareness.
19. Describe an environment for relationships.
20. List
five (5)
suggestions for fostering peer- and te.
Read chapter 14, 15 and 18 of the class textbook.Saucier.docxShiraPrater50
Read chapter 14, 15 and 18 of the class textbook.
Saucier Lundy, K & Janes, S.. (2016). Community Health Nursing. Caring for the Public’s Health. (3rd
ed.)
ISBN: 978-1-4496-9149-3
Once done answer the following questions;
1. How the different topics/health issues can be addressed through both professional health promotion and personal health promotion. What is the difference in the approach? How does each approach contribute to the desired effect?
2. Should health insurance companies cover services that are purely for health promotion purposes? Why or why not? What about employers? What are the pros and cons of this type of coverage?
3. What do you think about the role integrating nursing with faith? Is this something you feel is appropriate? When is it appropriate? What types of settings do you feel this would work best in? Do you feel nurses should integrate faith in their nursing practice? Why or why not and how?
4. Have you been a part of a group in which corruption of leadership has occurred? Do you feel it is unavoidable? How did you feel in that particular group?
APA format word document Arial 12 font attached to the forum in the discussion board title "Week 4 discussion questions".
A minimum of 2 evidence based references no older than 5 years old are required besides the class textbook
A minimum of 500 words without count the first and last page are required.
.
Read Chapter 10 APA FORMAT1. In the last century, what historica.docxShiraPrater50
Read Chapter 10 APA FORMAT
1. In the last century, what historical, social, political, and economic trends and issues have influenced today’s health-care system?
2. What is the purpose and process of evaluating the three aspects of health care: structure, process, and outcome?
3. How does technology improve patient outcomes and the health-care system?
4. How can you intervene to improve quality of care and safety within the health-care system and at the bedside?
5. Select one nonprofit organization or one government agencies that influences and advocates for quality improvement in the health-care system. Explore the Web site for your selected organization/agency and answer the following questions: •
What does the organization/agency do that supports the hallmarks of quality? •
What have been the results of their efforts for patients, facilities, the health-care delivery system, or the nursing profession? •
How has the organization/agency affected facilities where you are practicing and your own professional practice?
.
Read chapter 7 and write the book report The paper should b.docxShiraPrater50
Read chapter 7 and write the book report
The paper should be single-spaced, 2-page (excluding cover page and references) long, and typed in Times New Roman 12 points. The paper should have a title, and consists of at least two sections: 1) A brief narrative of how an IS/IT is realized, initiated, designed, and implemented in terms of what/when/where/how this happened, and key character players involved in the series of events.
.
Read Chapter 14 and answer the following questions1. Explain t.docxShiraPrater50
Read Chapter 14 and answer the following questions:
1. Explain the importance of proteins.
2. Define
amino acids, non-essential amino acids, essential amino acids, complete protein,
and
incomplete proteins.
3. Define
complementary proteins
and
supplementary proteins.
4. Why are
vitamins
important?
5. Define
fat soluble
and
water soluble.
6. What is
DNA
?
RNA?
7. Which vitamins play essential roles in the formation of blood cells and hemoglobin?
8. Which vitamins regulate bone growth?
9. Define
collagen.
10. Which vitamins regulate energy metabolism?
11. Define
neuromuscular
and
spina bifida.
12. What are
megadoses
?
13. Define
minerals
and tell why they are important.
14. What minerals support growth?
15. What are the major minerals found in bones and teeth?
16. Why is fluoride added to water supplies of communities? Why is fluoride important?
17. What are the major food sources of
calcium
and
phosphorus
?
18. Define
hemoglobin
. Define
iron-deficiency
anemia
.
19. What are the major food sources of iron?
20. Why is water so important to children? How is water lost and replaced in children?
21. Name
three (3)
problems caused by children drinking too much fruit juice.
https://books.google.com/books/about/Health_Safety_and_Nutrition_for_the_Youn.html?id=7zcaCgAAQBAJ&printsec=frontcover&source=kp_read_button#v=onepage&q&f=false
.
Read Chapter 2 first. Then come to this assignment.The first t.docxShiraPrater50
Read Chapter 2 first. Then come to this assignment.
The first theme of next week's class (Week 2) will be Chapter 2, Concepts of Infectious Disease. I will briefly go through the chapter to make sure that you understand it, and then we will have a discussion.
Since the chapter in the textbook is so full of important concepts, it would be difficult to narrow it down to a single topic for discussion. So I have posted this introduction and 3 separate subtopics. You can choose which one you want to write about. Each student should choose one of these subtopics for your major post. You should write well thought out primary comments on at least one of the points below (150-200 words).
BE SURE TO INCLUDE YOUR NAME AND SUBTOPIC IN THE HEADER FOR YOUR PAPER.
We will discuss each of the subtopics that were chosen by the students. Each of you should take an active role in presenting your topic to the other students. Explain the concept in your own words, or develop it further using a relevant example. As other students present their perspective on the same topic, hopefully an active discussion will take hold. I will jump in only as needed. This format will allow you to develop one subtopic in an active sense, but learn about the others by being drawn into them through other people's discussions.
Choose your subtopic:
Subtopic 1: Factors that affect the spread of epidemics
Question: Explain how the interaction between these factors are relevant to the transmission of AIDS. For example, which of these factors are most critical to the transmission of HIV. Which aren't.
1. Total number of hosts
2. Host’s birth rate
3. Rate at which new susceptible hosts migrate into population
4. Number of susceptible uninfected hosts
5. Rate at which disease can be transmitted from infected to uninfected hosts
6. Death rate of infected hosts
7. The number of infected hosts who survive and become immune or resistant to further infection
Subtopic 2: Acute versus Chronic Infections
Question: Compare the definitions of Acute Infections and Chronic Infections below. Based on what you know about HIV/AIDS at this point, which description most closely matches AIDS? Explain your answer, using evidence from the book to support your position.
What is an acute infection?
1. Produces symptoms and makes a person infectious soon after infection.
2. The infected person may: transmit the disease
die from the infection
recover and develop immunity
3. the acute microorganism
STRIKES QUICKLY
infects entire group (small group)
dies out
What is a chronic infection?
Person may never show symptoms
Person continues to carry infectious agent at a low level
Does NOT mount an effective immune response
Subtopic 3: Controlling infectious disease
Question: Explain what herd immunity is and how it works. Use an example from either the bo.
Journal of Public Affairs Education 515Teaching Grammar a.docxShiraPrater50
Journal of Public Affairs Education 515
Teaching Grammar and Editing in Public
Administration: Lessons Learned from
Early Offerings of an Undergraduate
Administrative Writing Course
Claire Connolly Knox
University of Central Florida School of Public Administration
ABSTRACT
College graduates need to possess strong writing skills before entering the work-
force. Although many public administration undergraduate programs primarily
focus on policy, finance, and management, we fall short of a larger goal if students
cannot communicate results to a variety of audiences. This article discusses the
results of a national survey, which concludes that few undergraduate public affairs
programs require an administrative/technical writing course. Based on pedagogical
theories, this article describes the design of a newly implemented, undergraduate,
administrative writing course. The article concludes with lessons learned, provides
recommendations for programs considering requiring an administrative writing
course, and discusses future research.
Keywords: administrative writing, Plain Language Movement, discourse community,
undergraduate course design
“Administrators not only need to know about communications, they need to
be able to communicate” (Denhardt, 2001, p. 529). Public administration under-
graduate students learn the importance of communication within organizations
in leadership, human resources, or organizational management courses; however,
practical instruction in communication skills, such as effective, audience-centered
writing, are lacking. Scholars (e.g., Cleary, 1990, 1997; Lee, 2000; Raphael &
Nesbary, 2005; Waugh & Manns, 1991) have noted this lack of required commun-
ication and writing courses in public administration curriculum. The majority of
administrative writing literature is from the late 1980s and early 1990s when
universities began implementing Writing Across the Curriculum programs (i.e.,
JPAE 19 (3), 515–536
516 Journal of Public Affairs Education
Londow, 1993; Stanford, 1992). The limited discussions and conclusions coincide
with private and public sector trends—newly hired students’ writing skills are
lacking (Hines & Basso, 2008; National Commission, 2005).
A survey by the National Commission on Writing for America’s Families,
Schools, and Colleges (2005) reported that approximately 80% of public sector
human resource directors seriously considered writing skills when hiring professional
employees and assumed new employees obtained these skills in college. Increasingly,
public managers require employees to attend writing and communication trainings,
which cost governments approximately $221 million annually (National Commis-
sion, 2005). In fact, the public sector (66%) is more likely to send professional/
salaried employees for writing training than the private sector (40%; National
Commission, 2005). Public, private, and nonprofit sector organizations certainly
should cont ...
This document provides guidance on managing suppliers for the TLIR5014 unit. It covers assessing suppliers and building relationships, evaluating delivery against agreements, negotiating with suppliers, resolving disagreements, and reviewing performance. Key areas discussed include developing criteria to evaluate suppliers; maintaining cooperative relationships; establishing performance indicators; developing evaluation methods; managing relationships; and continuously reviewing suppliers for quality, profitability and other metrics. The role of the supply/contract manager and importance of a contract management plan are also outlined.
MBA 6941, Managing Project Teams 1 Course Learning Ou.docxShiraPrater50
The document provides an overview of key concepts and processes related to project scope management and time management. It defines scope management as the processes used to define, control, and validate the work required to successfully deliver a project. It outlines six processes for scope management including planning scope management, collecting requirements, defining scope, creating a work breakdown structure, validating scope, and controlling scope. It also defines seven processes for time management including planning schedule management, defining activities, sequencing activities, estimating activity resources and durations, developing the schedule, and controlling the schedule. The critical path is described as the longest path through a project network diagram that determines the shortest project duration.
Inventory Decisions in Dells Supply ChainAuthor(s) Ro.docxShiraPrater50
Inventory Decisions in Dell's Supply Chain
Author(s): Roman Kapuscinski, Rachel Q. Zhang, Paul Carbonneau, Robert Moore and Bill
Reeves
Source: Interfaces, Vol. 34, No. 3 (May - Jun., 2004), pp. 191-205
Published by: INFORMS
Stable URL: https://www.jstor.org/stable/25062900
Accessed: 13-02-2019 19:24 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Interfaces
This content downloaded from 141.217.20.120 on Wed, 13 Feb 2019 19:24:25 UTC
All use subject to https://about.jstor.org/terms
Interfaces infjIML
Vol. 34, No. 3, May-June 2004, pp. 191-205 DOI i0.1287/inte.l030.0068
ISSN 0092-21021 eissn 1526-551X1041340310191 @ 2004 INFORMS
Inventory Decisions in Dell's Supply Chain
Roman Kapuscinski
University of Michigan Business School, Ann Arbor, Michigan 48109, [email protected]
Rachel Q. Zhang
Johnson Graduate School of Management, Cornell University, Ithaca, New York 14853, [email protected]
Paul Carbonneau
McKinsey & Company, 3 Landmark Square, Stamford, Connecticut 06901, [email protected]
Robert Moore, Bill Reeves
Dell Inc., Mail Stop 6363, Austin, Texas 78682 {[email protected], [email protected]}
The Tauber Manufacturing Institute (TMI) is a partnership between the engineering and business schools at
the University of Michigan. In the summer of 1999, a TMI team spent 14 weeks at Dell Inc. in Austin, Texas,
and developed an inventory model to identify inventory drivers and quantify target levels for inventory in the
final stage of Dell's supply chain, the revolvers or supplier logistics centers (SLC). With the information and
analysis provided by this model, Dell's regional materials organizations could tactically manage revolver inven
tory while Dell's worldwide commodity management could partner with suppliers in improvement projects to
identify inventory drivers and to reduce inventory. Dell also initiated a pilot program for procurement of XDX
(a disguised name for one of the major components of personal computers (PCs)) in the United States to insti
tutionalize the model and promote partnership with suppliers. Based on the model predictions, Dell launched
e-commerce and manufacturing initiatives with its suppliers to lower supply-chain-inventory costs by reducing
revolver inventory by 40 percent. This reduction would raise the corresponding inventory turns by 67 percent.
Net Present Value (NPV) calculations for XDX alone suggest $43 million in potential savings. To ensure project
longevity, Dell formed ...
It’s Your Choice 10 – Clear Values: 2nd Chain Link- Trade-offs - Best Chance of Getting the Most of What You Want.
Narrator: In today's episode, what do I really want? Roger and Nicole discussed the importance of being clear about your values when making a decision in order to give you the best chance of making the most of what you really want. When you understand what you care most about, you can determine which outcomes you prefer as a result of the decision. And, while we frequently can't get everything we want, making tradeoffs is easier when we are clear about our values. Roger: Nicole is something wrong? Nicole: Oh no, not really. I'm just kind of distracted today. See, I finally decided to bite the bullet and buy a car, but I'm having a lot of trouble deciding what to buy. I've been saving for years and I want to make sure I do this right. The problem is that I don't even know where to start. There are so many good cars out there. Roger: I know how tough it can be to try and figure out what you really want it, but you're in luck. On today's show, we're going to be talking about why being clear on your values is so important when making a decision. Nicole: A value is something you want as a result of the decision. Roger: Like when I was trying to decide which college to go to, some of my preferences were to go to a place with a good music program and a D-three basketball team. Nicole: It's funny because when I was looking for a school, I didn't care at all about the basketball team. I was much more interested in theater groups. Roger: and that's fine because values are completely up to the person making the decision. What I want will probably be different from what you want, but I use my values for my decisions and you will use yours for yours. Nicole: I was thinking about asking my friends for their opinions too. Roger: It can be very useful to get input from other people, especially when they're knowledgeable. Just be careful they don't try and talk you into what they want instead of what you wanted. Anyway, have you thought about the things you want the most from the car of your choice? Nicole: Oh sure. There are lots of things like I really want a car I can afford, that gets good gas mileage and is cute safe, a good size and comfortable for my friends. Roger: That's a good start. How about the things you don't want?
Nicole: Well, it has to be reliable. I'll be in a mess if it breaks down. I can't afford a lot of repair bills and I don't want a car that's too big. Roger: That's good. Identifying the things you don't want is just as important as the things you do want. Okay Nicole, now that we have your list, the next step is to ask yourself how important are these things?
Nicole: Well, they're all important.
Roger: Sure, but aren't some more important than others? Nicole: Of course, but I'm not really sure which or which? Roger: A good first step is to identify why something is important to you. For example, is getting good gas ...
MBA 5101, Strategic Management and Business Policy 1 .docxShiraPrater50
MBA 5101, Strategic Management and Business Policy 1
Course Learning Outcomes for Unit I
Upon completion of this unit, students should be able to:
2. Compare and contrast the integral functions of corporate governance.
2.1 Describe the roles and responsibilities of the board of directors in corporate governance.
2.2 Explain the Sarbanes-Oxley Act and its impact on corporate governance.
4. Analyze the processes for formulating corporate strategy.
4.1 Explain the benefits of strategic management.
5. Evaluate methods that impact strategy implementation, such as staffing, directing, and organizing.
5.1 Discuss the strategic audit as a method of analyzing corporate functions and activities.
Reading Assignment
In order to access the following resources, click the links below:
College of Business – CSU. (2016, January 12). MBA5101 Unit I lesson video [YouTube video].
Retrieved from
https://www.youtube.com/watch?v=p5axP8yAmFk&feature=youtu.be&list=PL08sf8iXqZn54RIuJs-
skgp4omxG-UOu5
Click here to access a transcript of the video.
Pomykalski, A. (2015). Global business networks and technology. Management, 19(1), 46-56. Retrieved from
https://libraryresources.columbiasouthern.edu/login?url=http://search.ebscohost.com/login.aspx?direc
t=true&db=bth&AN=103247112&site=ehost-live&scope=site
Silverstein, E. (2015). Years later, Sarbanes-Oxley is part of how companies do business. Insidecounsel,
26(286), 38-39. Retrieved from
https://libraryresources.columbiasouthern.edu/login?url=http://search.ebscohost.com/login.aspx?direc
t=true&db=bth&AN=111456112&site=ehost-live&scope=site
Wheelen, T. L., & Hunger, J. D. (1987). Using the strategic audit. SAM Advanced Management Journal,
52(1), 4. Retrieved from
https://libraryresources.columbiasouthern.edu/login?url=http://search.ebscohost.com/login.aspx?
direct=true&db=bth&AN=4604880&site=ehost-live&scope=site
Unit Lesson
When founders form companies, they usually focus on the product and the customers they hope to generate.
The founders are usually of the same mindset and intention about what they want their company to do and
how they would like it to grow. What many companies fail to plan for is the inevitable death of one of the
founding members and what that might mean for the vision and purpose of the company. In other words, what
would the management structure resemble if one of the founding partners had to deal with the heir of the
deceased partner?
For example, once, two middle-aged founders focused on the same mission, creating and living by their
cultural values and vision, diligently reaching out to their target market, and productively engaging their
customers. One partner unexpectedly died. After the funeral, the surviving founder finds himself now working
side-by-side with the recently deceased founder’s 17-year-old son or daughter. Very quickly, the surviving
UNIT I STUDY GUIDE
Governance and the Value
of Planning
https:// ...
MAJOR WORLD RELIGIONSJudaismJudaism (began .docxShiraPrater50
MAJOR WORLD RELIGIONS
JudaismJudaism (began circa 1,800 BC)
This was the first monotheistic religion on earth
God is all-powerful with many prophets, Jesus among them
Followers are called Jews, 80% of 14 million total adherents live in U.S. or Israel
Christianity
(began around 30AD)Most followers of any religion: 2 billionMost geographically widespread religionCenters on Jesus Christ as the savior whose sacrificial death forgives/erases Christians’ sinsHalf of global Christians are Catholics (the Americas) and one-fourth are Protestant (Europe and U.S.)
Islam
(began around 615AD)2nd largest world religion: 1.5 billion followersOver 80% are “Sunnis”, 20% are “Shiite”(Iran)Based on the Prophet Muhammad’s teachings & revelations
Green = Sunni
Maroon = Shiite
Buddhism
(began ca. 450 B.C.)Centered in East and Southeast Asia, 400 million followersBased on the example and teachings of Siddhartha Gautama (the Buddha) who lived in eastern India around 500 B.C.Life’s core suffering can be ended by releasing attachment to desires and becoming “awakened”
Taoism
(began ca. 500B.C.)
Lao-Tzu (Laozi) founding spiritualist/philosopher Action through non-action, simplicity, compassion, humility, learning from/oneness with the “Tao” (the force/energy of nature/all things)Practiced mostly in China, but expressed in Western pop culture (Star Wars, yoga, etc.)
HinduismFocused on the enlightened being Krishna who lived 5,000 BPBhagavad Gita religious text composed by one authorPracticed by hundreds of millions, principally in India
Animism/“Primal Indigenous”PolytheisticPracticed largely among tribal groupsEverything in nature, even non-living entities, have a spiritPhysical and spiritual realms are one, which is opposite of Western thinking
Religious Perspectives on the Human/Environment Relationship
Questions
How do you feel about Evolution vs. Creation?
Do you feel that people are more important than animals, plants, and nature?
Do you think about the effects of your lifestyle on the natural world? (trash, CO2, etc)
Do you believe that nature is here to supply man’s needs or that we have a responsibility to tend and care for nature as well?
Your responses…Indicate a position relative to some very old questions!These questions concern the fundamental or essential nature of the world, and as such they affect geographical worldviewsReligious/philosophical worldviews affect how we treat the planet
Man and Nature are Connected
Man and Nature are Separate
Judaism/Christianity/IslamEverything in nature was created by a single supreme being with unlimited powers.Man’s relationship to nature is either dominion or stewardship (but separate from nature either way).Salvation depends on faith and belief (Christianity) so issues like treatment of animals or conservation of resources are of minor ethical importanceEastern religions don’t separate man from nature as much as Abrahamic religions.
Nature as God’s Handiwork“But ...
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...NelTorrente
In this research, it concludes that while the readiness of teachers in Caloocan City to implement the MATATAG Curriculum is generally positive, targeted efforts in professional development, resource distribution, support networks, and comprehensive preparation can address the existing gaps and ensure successful curriculum implementation.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Credit Card Marketing Classification Trees Fr.docx
1. Credit Card Marketing
Classification Trees
From Building Better Models with JMP® Pro,
Chapter 6, SAS Press (2015). Grayson, Gardner
and Stephens.
Used with permission. For additional information,
see community.jmp.com/docs/DOC-7562.
2
Credit Card Marketing
Classification Trees
Key ideas: Classification trees, validation, confusion matrix,
misclassification, leaf report, ROC
curves, lift curves.
2. Background
A bank would like to understand the demographics and other
characteristics associated with whether a
customer accepts a credit card offer. Observational data is
somewhat limited for this kind of problem, in
that often the company sees only those who respond to an offer.
To get around this, the bank designs a
focused marketing study, with 18,000 current bank customers.
This focused approach allows the bank to
know who does and does not respond to the offer, and to use
existing demographic data that is already
available on each customer.
The designed approach also allows the bank to control for other
potentially important factors so that the
offer combination isn’t confused or confounded with the
demographic factors. Because of the size of the
data and the possibility that there are complex relationships
between the response and the studied
factors, a decision tree is used to find out if there is a smaller
subset of factors that may be more
important and that warrant further analysis and study.
The Task
We want to build a model that will provide insight into why
some bank customers accept credit card offers.
Because the response is categorical (either Yes or No) and we
have a large number of potential predictor
variables, we use the Partition platform to build a classification
tree for Offer Accepted. We are primarily
interested in understanding characteristics of customers who
have accepted an offer, so the resulting
model will be exploratory in nature.1
3. The Data Credit Card Marketing BBM.jmp
The data set consists of information on the 18,000 current bank
customers in the study.
Customer Number: A sequential number assigned to the
customers (this column is hidden and
excluded – this unique identifier will not be used directly).
Offer Accepted: Did the customer accept (Yes) or reject (No)
the offer.
Reward: The type of reward program offered for the card.
Mailer Type: Letter or postcard.
Income Level: Low, Medium or High.
# Bank Accounts Open: How many non-credit-card accounts are
held by the customer.
1 In exploratory modeling, the goal is to understand the
variables or characteristics that drive behaviors or particular
outcomes. In
predictive modeling, the goal is to accurately predict new
observations and future behaviors, given the current information
and
situation.
3
Overdraft Protection: Does the customer have overdraft
protection on their checking account(s)
4. (Yes or No).
Credit Rating: Low, Medium or High.
# Credit Cards Held: The number of credit cards held at the
bank.
# Homes Owned: The number of homes owned by the customer.
Household Size: Number of individuals in the family.
Own Your Home: Does the customer own their home? (Yes or
No).
Average Balance: Average account balance (across all accounts
over time).
Q1, Q2, Q3 and Q4 Balance: Average balance for each quarter
in the last year.
Prepare for Modeling
We start by getting to know our data. We explore the data one
variable at a time, two at a time, and many
variables at a time to gain an understanding of data quality and
of potential relationships. Since the focus
of this case study is classification trees, only some of this work
is shown here. We encourage you to
thoroughly understand your data and take the necessary steps to
prepare your data for modeling before
building exploratory or predictive models.
Exploring Data One Variable at a Time
Since we have a relatively large data set with many potential
predictors, we start by creating numerical
summaries of each of our variables using the Columns Viewer
(see Exhibit 1). (Under the Cols menu
select Columns Viewer, then select all variables and click Show
Summary. To deselect the variables,
click Clear Select).
Exhibit 1 Credit, Summary Statistics for All Variables With
5. Columns Viewer
4
Under N Categories, we see that each of our categorical
variables has either two or three levels. N
Missing indicates that we are missing 24 observations for each
of the balance columns. (Further
investigation indicates that these values are missing from the
same 24 customers.) The other statistics
provide an idea of the centering, spread and shapes of the
continuous distributions.
Next, we graph our variables one at a time. (Select the variables
within the Columns Viewer and click on
the Distribution button. Or, use Analyze > Distribution, select
all of the variables as Y, Columns, and
click OK. Click Stack from the top red triangle for a horizontal
layout).
In Exhibit 2, we see that only around 5.68 percent of the 18,000
offers were accepted.
Exhibit 2 Credit, Distribution of Offer Accepted
We select the Yes level in Offer Accepted and then examine the
distribution of accepted offers (the
shaded area) across the other variables in our data set (the first
6. 10 variables are shown in Exhibit 3).
Our two experimental variables are Reward and Mailer Type.
Offers promoting Points and Air Miles are
more frequently accepted than those promoting Cash Back,
while Postcards are accepted more often
than Letters. Offers also appear to be accepted at a higher rate
by customers with low to medium
income, no overdraft protection and low credit ratings.
Exhibit 3 Credit, Distribution of First 10 Variables
5
Note that both Credit Rating and Income Level are coded as
Low, Medium and High. Peeking at the
data table, we see that the modeling types for these variables
are both nominal, but since the values are
ordered categories they should be coded as ordinal variables. To
change modeling type, right-click on the
modeling type icon in the data table or in any dialog window,
and then select the correct modeling type.
Exploring Data Two Variables at a Time
We explore relationships between our response and potential
predictor variables using Analyze > Fit Y
by X (select Offer Accepted as Y, Response and the predictors
7. as X, Factor, and click OK.) For
categorical predictors (nominal or ordinal), Fit Y by X conducts
a contingency analysis (see Exhibit 4).
For continuous predictors, Fit Y by X fits a logistic regression
model.
The first two analyses in Exhibit 4 show potential relationships
between Offer Accepted and Reward
(left) and Offer Accepted and Mailer Type (right). Note that
tests for association between the categorical
predictors and Offer Accepted are also provided by default
(these are not shown in Exhibit 4), and
additional statistical options and tests are provided under the
top red triangles.
Exhibit 4 Credit Fit Y by X, Offer Accepted versus Reward
and Mailer Type
Although we haven’t thoroughly explored this data, thus far
we’ve learned that:
• Only a small percentage – roughly 5.68 percent – of offers are
accepted.
• We are missing some data for the Balance columns, but are not
missing values for any other
variable.
• Both of our experimental variables (Reward and Mailer Type)
appear to be related to
whether or not an offer is accepted.
• Two variables, Income Level and Credit Rating, should be
coded as Ordinal instead of
Nominal.
8. Again, we encourage you to thoroughly explore your data and to
investigate and resolve potential data
quality issues before building a model. Other tools, such as
scatterplots and the Graph Builder, should
also be used.
6
While we have only superficially explored the data in this
example (as we will also do in future examples),
the primary purpose of this exploration is to get to know the
variables used in this case study. As such, it
is intentionally brief.
The Partition Model Dialog
Having a good sense of data quality and potential relationships,
we now fit a partition model to the data
using Analyze > Modeling > Partition, with Offer Accepted as
Y, Response and all of the other
variables as X, Factor (see Exhibit 5).
Exhibit 5 Credit, Partition Dialog Window
A Bit About Model Validation
When we build a statistical model, there is a risk that the model
is overly complicated (overfit), or that the
9. model will not perform well when applied to new data. Model
validation (or cross-validation) is often used
to protect against over-fitting.2 There are two methods for
model validation available from the Partition
dialog window in JMP Pro: Specify a Validation Portion or
select a Validation column (note that other
methods are available from within the platform).
In this example, we’ll use a random hold out portion (30
percent) to protect against overfitting (to do so,
enter 0.30 in the Validation Portion field). This will assign 70
percent of the records to the training set,
which is used to build the model. The remaining 30 percent will
be assigned to the hold out validation set,
which will be used to see how well the model performs on data
not used to build the model.
Other Partition Model Dialog Options
Two additional options in the dialog window are Informative
Missing and Ordinal Restricts Order.
These are selected by default. In this example, we have two
ordinal predictors, Credit Rating and
2 For background information on model validation and
protecting against overfitting, see
en.wikipedia.org/wiki/Overfitting. For more
information on validation in JMP and JMP Pro, see Building
Better Models with JMP Pro, Chapter 6 and Chapter 8, or search
for
“validation” in the JMP Help.
10. 7
Income Level. We also have missing values for the five balance
columns. The Informative Missing
option tells JMP to include rows with missing values in the
model, and the Ordinal Restricts Order
option tells JMP to respect the ordered categories for ordinal
variables. For more information on these
options, see JMP Help.
The completed Partition dialog window is displayed in Exhibit
5.
Building the Classification Tree
Initial results show the overall breakdown of Offer Accepted
(Exhibit 6). Recall that roughly 5.7 percent of
offers were accepted. Note: Since a random holdout is used
your results from this point forward may be
different.3
Below the graph, we see that 12,610 observations are assigned
to the training set. These observations
will be used to build the model. The remaining 5,390
observations in the validation set will be used to
check model performance and to stop tree growth.
Note that we have changed some of the default settings:
• Since we have a relatively large data set, points were removed
from the graph (click on the top
red triangle and select Display Options > Show Points).
11. • The response rates and counts are displayed in the tree nodes
(select Display Options > Show
Split Count from the top red triangle).
Exhibit 6 Credit, Partition Initial Window
3 To obtain the same results as shown here, use the Random
Seed Reset add-in to set the random seed to 123 before
launching
the Partition platform. The add-in can be downloaded and
installed from the JMP User Community:
community.jmp.com/docs/DOC-6601.
8
How Classification Trees Are Formed
When building a classification tree, JMP iteratively splits the
data based on the values of predictors to
form subsets. These subsets form the “branches” of the tree.
Each split is made at the predictor value
that causes the greatest difference in proportions (for the
outcome variable) in the resulting subsets.
A measure of the dissimilarity in proportions between the two
subsets is the likelihood ratio chi-square
statistic and its associated p-value. The lower the p-value, the
greater the difference between the groups.
12. When JMP calculates this chi-square statistic in the Partition
platform, it is labeled G^2, and the p-value
that is calculated is adjusted to account for the number of splits
that are being considered. The adjusted
p-value is transformed to a log scale using the formula -
log10(adjusted p-value). This value is called the
LogWorth. The bigger the LogWorth value, the better the split
(Sall, 2002).
To find the split with the largest difference between subgroups
(and the corresponding largest value of
LogWorth), we need to consider all possible splits. For each
variable, the best split location, or cut point,
is determined, and the split with the highest LogWorth is chosen
as the optimal split location.
JMP reports the G^2 and LogWorth values, along with the best
cut points for each variable, under
Candidates (use the gray disclosure icon next to Candidates to
display). A peek at the candidates in our
example indicates that the first split will be on Credit Rating,
with Low in one branch and High and
Medium in the other (Exhibit 7).
Exhibit 7 Credit, Partition Initial Candidate Splits
The tree after three splits (click Split three times) is shown in
Exhibit 8.
Not surprisingly, the model is split on Credit Rating, Reward
and Mailer Type. The lowest probability of
accepting the offer (0.0196) is Credit Rating(Medium, High)
and Reward(Cash Back, Points). The
highest probability (0.1473) is Credit Rating(Low) and Mailer
Type(Postcard).
13. After each split, the model RSquare (or, Entropy RSquare)
updates (this is shown at the top of Exhibit 8).
RSquare is a measure of how much variability in the response is
being explained by the model. Without a
validation set, we can continue to split until the minimum split
size is achieved in each branch. (The
minimum split size is an option under the top red triangle,
which is set to 5 by default.) However,
additional splits are not necessarily beneficial and lead to more
complex and potentially overfit models.
9
Exhibit 8 Credit, Partition after Three Splits
Since we have a validation set, we click Go to automate the
tree-building process. When this option is
used, the final model will be based on the model with the
maximum value of the Validation RSquare
statistic.
The Split History Report (Exhibit 9) shows how the RSquare
value changes for training and validation
data after each split (note that the y-axis has been rescaled for
illustration). The vertical line is drawn at
15, the number of splits used in the final model.
Exhibit 9 Split History, with Maximum Validation R-Square at
14. Fifteen Splits
10
This illustrates both the concept of overfitting and the
importance of using validation. With each split, the
RSquare for the training data continues to increase. However,
after 15 splits, the validation RSquare (the
lower line in Exhibit 9) starts to decrease. For the validation
set, which was not used to build the model,
additional splits are not improving our ability to predict the
response.
Understanding the Model
To summarize which variables are involved in these 15 splits,
we turn on Column Contributions (from
the top red triangle). This table indicates which variables are
most important in terms of the overall
contribution to the model (see Exhibit 10).
Credit Rating, Mailer Type, Reward and Income Level
contribute most to the model. Several variables,
including the five balance variables, are not involved in any of
the splits.
Exhibit 10 Credit, Split History after Fifteen Splits
15. Model Classifications and the Confusion Matrix
One overall measure of model accuracy is the Misclassification
Rate (select Show Fit Details from the
top red triangle). The misclassification rate for our validation
data is 0.0573, or 5.73 percent. The numbers
behind the misclassification rate can be seen in the confusion
matrix (bottom, in Exhibit 11). Here, we
focus on the misclassification rate and confusion matrix for the
validation data. Since these data were not
used in building the model, this approach provides a better
indication of how well the model classifies our
response, Offer Accepted.
There are four possible outcomes in our classification:
• An accepted offer is correctly classified as an accepted offer.
• An accepted offer is misclassified as not accepted.
• An offer that was not accepted is correctly classified as not
accepted.
• An offer that was not accepted is misclassified as accepted.
One observation is that there were few cases wherein the model
predicted that the offer would be
accepted (see value “2” in the Yes column of the validation
confusion matrix in Exhibit 11.) When the
target variable is unbalanced (i.e., there are far more
observations in one level than in the other), the
model that is fit will usually result in probabilities that are
small for the underrepresented category.
16. 11
Exhibit 11 Credit, Fit Details With Confusion Matrix
In this case, the overall rate of Yes (i.e., offer accepted) is 5.68
percent, which is close to the
misclassification rate for this model. However, when we
examine the Leaf Report for the fitted model
(Exhibit 12), we see that there are branches in the tree that have
much richer concentrations of Offer
Accepted = Yes than the overall average rate. (Note that results
in the Leaf Report are for the training
data.)
Exhibit 12 Credit, Leaf Report for Fitted Model
12
The model has probabilities of Offer Accepted = Yes in the
range [0.0044, 0.6738]. When JMP classifies
rows with the model, it uses a default of Prob > 0.5 to make the
decision. In this case, only one of the
predicted probabilities of Yes is > 0.5, and this one branch (or
node) has only 11 observations: 8 yes and
3 no under Response Counts in the bottom table in Exhibit 12.
17. The next highest predicted probability of
Offer Accepted = Yes is 0.2056. As a result, all other rows are
classified as Offer Accepted = No.
The ROC Curve
Two additional measures of accuracy used when building
classification models are Sensitivity and 1-
Specificity. Sensitivity is the true positive rate. In our example,
this is the ability of our model to correctly
classify Offer Accepted as Yes. The second measure, 1-
Specificity, is the false positive rate. In this
case, a false positive occurs when an offer was not accepted, but
was classified as Yes (accepted).
Instead of using the default decision rule of Prob > 0.5, we
examine the decision rule Prob > T, where we
let the decision threshold T range from 0 to 1. We plot
Sensitivity (on the y-axis) versus 1-Specificity (on
the x-axis) for each possible threshold value. This creates a
Receiver Operating Characteristic (ROC)
curve. The ROC curve for our model is displayed in Exhibit 13
(this is a top red triangle option in the
Partition report).
Exhibit 13 Credit, ROC Curve for Offer Accepted
Conceptually, what the ROC curve measures is the ability of the
predicted probability formulas to rank an
observation. Here, we simply focus on the Yes outcome for the
Offer Accepted response variable. We
save the probability formula to the data table, and then sort the
table from highest to lowest probability. If
this probability model can correctly classify the outcomes for
Offer Accepted, we would expect to see
18. more Yes response values at the top (where the probability for
Yes is highest) than No responses.
Similarly, at the bottom of the sorted table, we would expect to
see more No than Yes response values.
13
Constructing an ROC Curve
What follows is a practical algorithm to quickly draw an ROC
curve after the table has been sorted by the
predicted probability. Here, we walk through the algorithm for
Offer Accepted = Yes, but this is done
automatically in JMP for each response category.
For each observation in the sorted table, starting at the
observation with the highest probability, Offer
Accepted = Yes:
• If the observed response value is Yes, then a vertical line
segment (increasing along the Sensitivity
axis) is drawn. The length of the line segment is 1/(total number
of Yes responses in the table).
• If the observed response value is No, then a horizontal line
segment (increasing along the 1-
Specificity axis) is drawn. The length of the line segment is
1/(total number of “No” responses in the
data table).
19. Simple ROC Curve Examples
We use a simple example to illustrate. Suppose we have a data
table with only 8 observations. We sort
these observations from high to low based on the probability
that the Outcome = Yes. The sorted actual
response values are Yes, Yes, Yes, No, Yes, Yes, No and Yes.
This results in the ROC curve on the left
of Exhibit 14. Arrows have been added to show the steps in the
ROC curve construction. The first three
line segments are drawn up because the first three sorted values
have Outcome = Yes.
Now, suppose that we have a different probability model that
we use to rank the observations, resulting in
the sorted outcomes Yes, No, Yes, No, No, Yes, No and Yes.
The ROC curve for this situation is shown
on the right of Exhibit 14. The first ROC curve moves “up”
faster than the second curve. This is an
indication that the first model is doing a better job of separating
the Yes responses from the No
responses based on the predicted probability.
Exhibit 14 ROC Curve Examples
Referring back to the sample ROC curve in Exhibit 13, we see
that JMP has also displayed a diagonal
reference line on the chart, which represents the Sensitivity = 1-
Specificity line. If a probability model
cannot sort the data into the correct response category, then it
may be no better than simply sorting at
random. In this case, the ROC curve for a “random ranking”
model would be similar to this diagonal line.
A model that sorts the data perfectly, with all the Yes responses
at the top of the sorted table, would have
20. an ROC Curve that goes from the origin of the graph straight up
to sensitivity = 1, then straight over to 1-
specificity = 1. A model that sorts perfectly can be made into a
classifier rule that classifies perfectly; that
is, a classifier rule that has a sensitivity of 1.0 and 1-specificity
of 0.0.
14
The area under the curve, or AUC (labeled Area in Exhibit 13)
is a measure of how well our model sorts
the data. The diagonal line, which would represent a random
sorting model, has an AUC of 0.5. A perfect
sorting model has an AUC of 1.0. The area under the curve for
Offer Accepted = Yes is 0.7369 (see
Exhibit 13), indicating that the model predicts better than the
random sorting model.
The Lift Curve
Another measure of how well a model can sort outcomes is the
model lift. As with the ROC curve, we
examine the table that is sorted in descending order of predicted
probability. For each sorted row, we
calculate the sensitivity and divide that by the proportion of
values in the table whereby Offer Accepted =
Yes. This value is the model lift.
Lift is a measure of how much “richness” in the response we
achieve by applying a classification rule to
21. the data. A Lift Curve plots the Lift (on the y-axis) against the
Portion (on the x-axis). Again, consider
the data table that has been sorted by the predicted probability
of a given outcome. As we go down the
table from the top to the bottom, portion is the relative position
of the row that we are considering. The top
10 percent of rows in the sorted table corresponds to a portion
of 0.1, the top 20 percent of rows
corresponds to a portion of 0.2, and so on. The lift for Offer
Accepted = Yes for a given portion is simply
the proportion of Yes responses in this portion, divided by
overall proportion of Yes responses in the
entire data table.
The higher the lift at a given portion, the better our model is at
correctly classifying the outcome within this
portion. For Offer Accepted = Yes, the lift at Portion = 0.15 is
roughly 2.5 (see Exhibit 15). This means
that in rows in the data table corresponding to the top 15% of
the model’s predicted probabilities, the
number of actual Yes outcomes is 2.5 times higher than we
would expect if we had chosen 15 percent of
rows from the data set at random. If the model does not sort the
data well, then the lift will hover at
around 1.0 across all of portion values.
Exhibit 15 Lift Curve for Offer Accepted
Lift provides another measure of how good our model is at
classifying outcomes; it is particularly useful
when the overall predicted probabilities are lower than 0.5 for
the outcome that we wish to predict.
22. 15
Though in this example the majority of the predicted
probabilities of Offer Accepted = Yes were less than
0.2, the lift curve indicates that there are threshold values that
we could use with the predicted probability
model to create a classifier rule that will be better than guessing
at random. This rule can be used to
identify portions of our data that contain a much richer number
of customers who are likely to accept an
offer.
For categorical response models, the misclassification rate,
confusion matrix, ROC curve and lift curve all
provide measures of model accuracy; each of these should be
used to assess the quality of the prediction
model.
Summary
Statistical Insights
In this case study, a classification tree was used to predict the
probability of an outcome based on a
set of predictor variables using the Partition platform. If the
response variable is continuous rather
than categorical, then a regression tree can be used to predict
the mean of the response.
Construction of regression trees is analogous to construction of
classification trees, however splits
are based on the mean response value rather than the probability
of outcome categories.
23. Implications
This model was created for explanatory rather than predictive
purposes. Our goal was to understand
the characteristics of customers most likely to accept a credit
card offer. In a predictive model, we are
more interested in creating a model that accurately predicts the
response (i.e., predicts future
customer behavior) than we are in identifying important
variables or characteristics.
JMP Features and Hints
In this case study we used the Columns Viewer and Distribution
Platforms to explore variables
one at a time, utilizing Fit Y by X to explore the relationship
between our response (or target variable)
and predictor variables. This exploratory work was only
partially completed herein.
We weighed the possibility of using the misclassification rate
and confusion matrix as overall
measures of model accuracy, and introduced the ROC and lift
curves as additional measures of
accuracy. As discussed, ROC and lift curves are particularly
useful in cases where the probability of
the target response category is low.
Note: The cutoff for classification used throughout JMP is 0.50.
In some modeling situations it may be
desirable to change the cutoff of classification (say, when the
probability of response is extremely
low). This effect can be achieved manually by saving the
prediction formula to the data table, and
24. then creating a new formula column that classifies the outcome
based on a specified cutoff. In the
sample formula below, JMP will classify an outcome as “Yes” if
the predicted probability of survival is
<= 0.30. The Tabulate platform (under Analyze) can then be
used to manually create a confusion
matrix.
An add-in from the JMP User Community can also be used to
change the cut-off for classification
(community.jmp.com/docs/DOC-6901). This add-in allows the
user to enter a range of values for the
16
cutoff, and produces confusion matrices for each cutoff value.
The goal is to find a cutoff that
minimizes the misclassification rate on the validation set.
Exercises
Exercise 1: Use the Credit Card Marketing BBM.jmp data set to
answer the following
questions:
a. The Column Contributions output after 15 splits is shown in
Exhibit 10. Interpret this
output. How can this information be used by the company?
25. What is the potential value of
identifying these characteristics?
b. Recreate the output shown in Exhibits 9-11, but instead use
the split button to manually
split. Create a classification tree with 25 splits.
c. How did the Column Contributions report change?
d. How did the Misclassification Rate and Confusion Matrix
change? How did the ROC or
Lift Curves change? Did these additional splits provide any
additional (useful)
information?
e. Why is this an exploratory model rather than a predictive
model? Describe the difference
between exploratory and predictive models.
Exercise 2: Use the Titanic Passengers.jmp data set in the JMP
Sample Data Library (under
the Help menu) for this exercise.
This data table describes the survival status of 1,309 of the
1,324 individual passengers on the
Titanic. Information on the 899 crew members is not included.
Some of the variables are
described below:
Name: Passenger Name
Survived: Yes or No
Passenger Class: 1, 2, or 3 corresponding to 1st, 2nd or 3rd
class
Sex: Passenger sex
Age: Passenger age
Siblings and Spouses: The number of siblings and spouses
26. aboard
Parents and Children: The number of parents and children
aboard
Fare: The passenger fare
Port: Port of embarkment (C = Cherbourg; Q = Queenstown; S =
Southampton)
Home/Destination: The home or final intended destination of
the passenger
Build a classification tree for Survived by determining which
variables to include as predictors.
Do not use model validation for this exercise. Use Column
Contributions and Split History to
determine the optimal number of splits.
a. Which variables, if any, did you choose not to include in the
model? Why?
b. How many splits are in your final tree?
c. Which variables are the largest contributors?
d. What is your final model? Save the prediction formula for
this model to the data table (we
will refer to it in the next exercise).
17
e. What is the misclassification rate for this model? Is the
model better at predicting survival
or non-survival? Explain.
27. f. What is the area under the ROC curve for Survived? Interpret
this value. Does the model
do a better job of classifying survival than a random model?
g. What is the lift for the model at portion = 0.1 and at portion
= 0.25? Interpret these values.
Exercise 3: Use the Titanic Passengers.jmp data set for this
exercise. Use the Fit Model
platform to create a logistic regression model for Survived?
using the other variables as
predictors. Include interaction terms you think might be
meaningful or significant in predicting the
probability of survival.
For information on fitting logistic regression models, see the
guide and video at
community.jmp.com/docs/DOC-6794.
a. Which variables are significant in predicting the probability
of survival? Are any of the
interaction terms significant?
b. What is the misclassification rate for your final the logistic
model?
c. Compare the misclassification rates for the logistic model and
the partition model created
in Exercise 2. Which model is better? Why?
d. Compare this model to the model produced using a
classification tree. Which model
would be easier to explain to a non-technical person? Why?
28. 18
SAS Institute Inc. World Headquarters +1
919 677 8000
JMP is a software solution from SAS. To learn more about
SAS, visit www.sas.com
For JMP sales in the US and Canada, call 877 594 6567 or go to
www.jmp.com
SAS and all other SAS Institute Inc. product or service names
are registered trademarks or trademarks of SAS Institute Inc. in
the USA and other countries.
® indicates USA registration. Other brand and product names
are trademarks of their respective companies. S81971.1111
Project PlannerProject PlannerSelect a period to highlight at
right. A legend describing the charting follows. Period
Highlight:1Plan DurationActual Start% CompleteActual
(beyond plan)% Complete (beyond plan)ACTIVITYPLAN
STARTPLAN DURATIONACTUAL STARTACTUAL
DURATIONPERCENT
COMPLETEPERIODS1234567891011121314151617181920212
22324252627282930313233343536373839404142434445464748
495051525354555657585960Activity 01151425%Activity
021616100%Activity 03242535%Activity 04484610%Activity
05424885%Activity 06434685%Activity 07545350%Activity
08525560%Activity 09525675%Activity 106567100%Activity
29. 11615860%Activity 1293930%Activity 13969750%Activity
1493910%Activity 1594851%Activity 1610510380%Activity
171121150%Activity 181261270%Activity 191211250%Activity
201451460%Activity 2114814244%Activity
221471430%Activity 2315415812%Activity
241551535%Activity 251581550%Activity 261628163050%
Case 2 - Baggage Complaints:
Descriptive Statistics and Time Series
Plots
Marlene Smith, University of Colorado Denver
Business School
2
Baggage Complaints:
Descriptive Statistics and Time Series Plots
Background
Anyone who travels by air knows that occasional problems are
30. inevitable. Flights can be delayed or
cancelled due to weather conditions, mechanical problems, or
labor strikes, and baggage can be lost,
delayed, damaged, or pilfered. Given that many airlines are
now charging for bags, issues with baggage
are particularly annoying. Baggage problems can have a serious
impact on customer loyalty, and can be
costly to the airlines (airlines often have to deliver bags).
Air carriers report flight delays, cancellations, overbookings,
late arrivals, baggage complaints, and other
operating statistics to the U.S. government, which compiles the
data and reports it to the public.
The Task
Do some airlines do a better job of handling baggage? Compare
the baggage complaints for three
airlines: American Eagle, Hawaiian, and United. Which airline
has the best record? The worst? Are
complaints getting better or worse over time? Are there other
factors, such as destinations, seasonal
effects or the volume of travelers that affect baggage
performance?
The Data Baggage Complaints.jmp
The data set contains monthly observations from 2004 to 2010
for United Airlines, American Eagle, and
Hawaiian Airlines. The variables in the data set include:
Baggage The total number of passenger complaints for theft of
baggage contents, or for
lost, damaged, or misrouted luggage for the airline that month
Scheduled The total number of flights scheduled by that airline
31. that month
Cancelled The total number of flights cancelled by that airline
that month
Enplaned The total number of passengers who boarded a plane
with the airline that month
These data are available from the U.S. Department of
Transportation, “Air Travel Consumer Report,” the
Office of Aviation Enforcement and Proceedings, Aviation
Consumer Protection Division
(http://airconsumer.dot.gov/reports/index.htm). The data for
baggage complaints and enplaned
passengers cover domestic travel only.
3
Analysis
We start by exploring baggage complaints over time.
Exhibit 1 shows the time series plot for the variable Baggage by
Date for each of the airlines. United
Airlines has the most complaints about mishandled baggage in
almost all of the months in the data set;
Hawaiian Airlines has the fewest number of complaints in all
months. Do we conclude, then, that United
Airlines has the “worst record” for mishandled baggage and
Hawaiian, the best?
32. Exhibit 1 Time Series Plots of Baggage by Airline
(Graph > Graph Builder; drag and drop Baggage in Y, Date in X
and Airline in Overlay. Click on the smoother icon
at the top to remove the smoother, and click on the line icon.
Or, right-click in the graph, and select Smoother >
Remove, and Points > Change to > Line. Then, click Done.)
United Airlines is a much bigger airline, as evidenced by
Exhibit 2, which shows the average number of
scheduled flights and enplaned passengers by airline. United
handles more than three times the number
of passengers than American Eagle on average, and almost eight
times more than Hawaiian. Thus,
United has more opportunities to mishandle luggage because it
handles more luggage – it’s simply a
much bigger airline.
Exhibit 2 Average Scheduled Flights and Enplaned Passengers
by Airline
(Analyze > Tabulate; drag Airline in drop zone for
rows, and Scheduled and Enplaned in the drop
zone for column as analysis columns. Then, drag
Mean from the middle panel to the middle of the
table.
Note that in JMP versions 10 and earlier Tabulate is
under the Tables menu.)
33. 4
To adjust for size we calculate the rate of baggage complaints
(Exhibit 3): Baggage % = 100 ×
(Baggage/Enplaned).
Exhibit 3 Calculating Baggage %
(Create a new column in the data table, and
rename it Baggage %. Right click on the
column header, and select Formula to open
the Formula Editor. To create the formula:
1. Type 100
2. Select multiply on the keypad
3. Select Baggage from the columns list
4. Select divide by on the keypad
5. Select Enplaned from the columns list
6. Click OK.)
In Exhibit 4 we compare the records of the three airlines using
Baggage %. We see that American
Eagle has the highest rate of baggage complaints when adjusted
for number of enplaned passengers.
Exhibit 4 Average Baggage % by Airline
34. Plotting the Baggage % on a time series plot allows us to see
changes over time. In Exhibit 5 we see
that baggage complaint rates from American Eagle and United
passengers increased through 2006 and
began declining thereafter.
5
Exhibit 5 Time Series Plots of Baggage % by Airline
The time series for Hawaiian passengers in Exhibit 5 is
relatively flat compared to American Eagle and
United, so it’s difficult to detect a pattern over time. In Exhibit
6 we isolate the data for the Hawaiian
flights. Complaint rates for Hawaiian passengers began to drop
in the summer of 2008 until fall of 2010,
after which the rate of complaints returned to historical levels.
Exhibit 6 Data Filter and Time Series Plot of Baggage % -
Hawaiian Airline
35. (Rows > Data Filter; select Airline
and click Add. Then, select
Hawaiian and check the Show and
Include boxes.)
6
Let’s return to Exhibit 5: Do we see any other patterns in
baggage complaint rates over time? The pattern
of spikes and dips indicates that changes in the rate of baggage
complaints may have a seasonal
component. In Exhibit 7 we plot the average Baggage % by
month for the three airlines to investigate
this further.
Average rates are highest in December and January and in the
summer months, and lowest in the spring
and late fall. Interestingly, all three airlines seem to follow the
same general pattern. (Using the Data
Filter to zoom in on Hawaiian will make this more apparent.)
Exhibit 7 Time Series Plots of Monthly Average Baggage % by
Airline
(Graph > Graph Builder; drag and drop Baggage % in Y, Month
in X and Airline in Group X. Then, click on the line icon at the
top.
Or, right-click in the graph, and select Points > Change to >
36. Line. Then, click Done.)
Summary
Statistical Insights
It is important to ask a lot of pointed questions about the
purpose and scope of a project before
jumping into data analysis. How will the results be used? And,
is the information available to answer
the questions being asked? The data set provided did not have
information on flight destinations, so
we are not able to investigate whether destination is related to
baggage issues.
To provide a fair comparison of performance across airlines,
it’s best to standardize for differences in
volume. It is also important to pay close attention to units of
measurement, as will be seen in an
exercise.
Managerial Implications
When asking a data analyst to conduct a study, ensure that the
purpose is clear. What, specifically,
do you want to know? And, how will the information provided
by the analyst be used to guide the
decision-making process?
What did we learn from the analysis? Managers at American
Eagle should note the relatively high
rate of baggage complaints and consider costs and benefits of
improvements. All three airlines
37. 7
should anticipate high rates of baggage complaints in January,
February and the summer months and
should plan accordingly. If we are interested in studying
baggage complaints for different
destinations, additional data are required.
JMP Features and Hints
This case uses the Graph Builder to compare multiple time
series and Tabulate to produce summary
statistics by groups. Data Filter is used with the Graph Builder
to further investigate a particular
group.
Exercises
1. Exhibit 4 shows that the average Baggage % for American
Eagle is 1.033. Provide a
nontechnical interpretation of this number by using it in this
sentence: “1.033 is the…” Pay
particular attention to units of measurement.
2. Use Tabulate to calculate the standard deviations of Baggage
% for each airline. Provide a non-
technical interpretation of the standard deviations. Is one
airline more variable than the others?
3. Compare the three airlines based on cancelled flights: Which
38. airline has the best record? The
worst?
8
SAS Institute Inc. World Headquarters +1
919 677 8000
JMP is a software solution from SAS. To learn more about
SAS, visit www.sas.com
For JMP sales in the US and Canada, call 877 594 6567 or go to
www.jmp.com
SAS and all other SAS Institute Inc. product or service names
are registered trademarks or trademarks of SAS Institute Inc. in
the USA and other countries.
® indicates USA registration. Other brand and product names
are trademarks of their respective companies. S81971.1111
Case 8 - Contributions:
Simple Linear Regression and Time
39. Series
Marlene Smith, University of Colorado Denver
Business School
2
Contributions1:
Simple Linear Regression and Time Series
Background
The Colorado Combined Campaign solicits Colorado
government employees’ participation in a fund-
raising drive. Funds raised by the campaign go to over 700
Colorado charities in all, including the
Humane Society of Boulder Valley and the Denver Children’s
Advocacy Center. Prominent state
employees, such as university presidents, chancellors and
lieutenant governors, head the annual
campaigns. An advisory committee determines whether the
charities receiving contributions provide the
services claimed in a fiscally responsible manner.
All Colorado state employees may contribute to the fund.
However, certain state institutions are targeted
to receive promotional brochures and campaign literature.
Employees in these targeted groups are
40. referred to as “eligible” employees. Each year, the number of
eligible employees is known in June.
Fund-raising activities are then conducted throughout the fall.
By year’s end, total contributions raised
that year are tabulated.
The Task
It is now June 2010. The number of eligible employees for
2010 has been determined to be 53,455.
Does knowing the number of eligible employees help predict
2010 year-end contributions?
The Data
Contributions.jmp
This is an annual time-series from 1988 – 2009. The variables
are contribution Year and:
Actual Total contributions to the campaign for the year in
dollars
Employees Number of eligible employees that year
Analysis
The average level of contributions during this time period was
$1,143,769, with a typical fluctuation of
$339,788 around the average. The average number of eligible
employees was 45,419, with a typical
fluctuation of 9,791.
Exhibit 1 Summary Statistics for Actual and Employees
(Analyze > Tabulate; drag Actual and Employees in drop
zone for rows as analysis columns. Then, drag Mean and
Std Dev from the middle panel to drop zone for columns.
41. Note that in JMP versions 10 and earlier Tabulate is under
the Tables menu.)
1Mel Rael, Executive Director of the Colorado Combined
Campaign, graciously provided these data.
3
As we can see in Exhibit 2, contributions are growing over
time:
Exhibit 2 Time Series Plot of Actual by Year
(Graph > Graph Builder; drag and drop Actual in Y and Year in
X. Click on the smoother icon at
the top to remove the smoother. Hold the shift key and click
the line icon to add a line. Or, right
click in the graph to select these options. Then, click Done.)
The long-term growth in contributions is attributable to two
phenomena:
• The amount contributed per eligible employee is mostly
upward (Exhibit 3, top).
• The number of eligible employees is on the rise, particularly
in the 1999 to 2002 campaign years
(Exhibit 3, bottom).
42. Exhibit 3 Time Series Plots of Actual per Employee and
Employees
(Create a new column and rename it Actual
per Employee, then use the Formula Editor
to create the formula – Actual divided by
Employees.
Follow the instructions for Exhibit 2 to create
the graph for Employees. Then, click and
drag Actual per Employee above
Employees in Y, and release.
To change the markers for points, use the
lasso from the toolbar to select the points
(draw a circle around them). Then go to
Rows > Markers and select a marker.)
4
The scatterplot and least squares regression line using Actual as
the response variable and Employees
as the predictor variable is shown in Exhibit 4. The formula for
the regression line is found below the plot
43. under Linear Fit. The slope of the fitted line, 33.555, estimates
the contribution for each eligible employee
over this time period. Hence, the model estimates an additional
$33.56 in contributions for each eligible
employee. Under Parameter Estimates, we see that the number
of employees is a statistically significant
predictor of year-end contributions; the p-value, listed as Prob >
|t|, is < 0.0001.
The number of employees doesn’t perfectly predict
contributions. Just over 93% of the variability in
contributions is associated with variability in number of eligible
employees (RSquare = 0.934907).
Comparing the standard deviation of Actual ($339,788) to the
root mean square of the regression
equation ((RMSE = $88,832) suggests that a substantial
reduction in the variation in contributions occurs
by using the regression model to explain variation in year-end
contributions.
Exhibit 4 Regression with Actual (Y) and Employees (X)
(Analyze > Fit Y by X. Use Actual as Y,
Response and Employees as X, Factor.
Under the red triangle select Fit Line. Note:
To remove the markers in Exhibit 3, go to the
Rows menu and select Clear Row States.)
44. We’ve been informed that the number of eligible employees in
2010 is 53,455. To use the regression
equation to forecast 2010 year-end contributions, we can plug
this number into the regression equation.
5
If the number of Employees is 53,455, the predicted Actual
contributions is:
Actual = -380265.5 + (33.555042) x Employees
= -380265.5 + (33.555042) x (53,455)
= 1413419.3 (or, $1,413,419)
In words, given that the number of eligible employees is 53,455,
our model estimates that 2010 year-end
contributions will be approximately $1.413 million.
Easier still, we can skip the math exercise, save the regression
formula and prediction intervals and ask
JMP to calculate the estimated contributions for 2010 (Exhibit
5). Prediction intervals are useful, since
the number of employees isn’t a perfect predictor of
contributions. The prediction interval gives us an
estimate of the interval in which the 2010 year-end
45. contributions will fall (with 95% confidence).
Exhibit 5 Predicted Value and Prediction Interval for 2010
Contribution
(In the Bivariate Fit window, select Save Predicteds under the
red Triangle for Linear Fit. JMP will create a new column with
the
prediction formula for Actual. Create a new row and enter a
value for Employees – the predicted value for Actual will
display. To
save prediction intervals, use Analyze > Fit Model; select
Actual as Y and Employees as a model effect, and hit Run.
Under the
red triangle select Save Columns > Indiv Confidence Intervals.)
Predicted values can also be explored dynamically using the
cross-hair tool. In Exhibit 6, we see that the
predicted value for Actual, if Employees is 53,414, is around
$1.402 million.
Exhibit 6 Using Cross-hair Tool to Explore Predicted
Contribution
(Select the cross-hair tool on the toolbar. Click
on the regression line at the value of the
predictor to see the predicted response value.)
46. 6
We can also graphically explore prediction intervals (Exhibit 7).
Exhibit 7 Prediction Intervals for Actual
(In the Bivariate Fit window, select
Confid Curves Indiv under the red
Triangle next to Linear Fit. Use
the cross-hairs to find the upper
and lower bounds for the
prediction interval.)
Summary
Statistical Insights
Forecasting using regression involves substituting known or
hypothetical values for X into the
regression equation and solving for Y. In this case, values for
the predictor variable in the forecasting
horizon are known in advance; i.e., we know that the 2010 value
for Employees is 53,455, so we
plugged this value into the regression equation to forecast year-
end contributions. In another setting,
47. in which the same-year value for X is unknown, how would we
proceed? One possibility is to forecast
the value of the predictor variable. Another possibility, when
theoretically and statistically justified, is
to use lagged values of the original predictor variables in the
regression model.
When building any regression model, residuals should be
checked to ensure that the linear fit makes
sense.
Managerial Implications
Regression has provided a prediction for year-end 2010
Colorado Combined Campaign contributions
of $1.4M. In managerial settings such as this, where the
response variable represents a business
goal, managers often set higher expectations than the predicated
value to motivate improved
performance. One such choice here might be the upper 95%
prediction limit of $1.6M.
This forecasting methodology can be repeated year after year.
Once the final contributions to 2010
are known, they can be added to the data set and the regression
line can be recalculated. By
midyear of 2011, the number of eligible employees will be
known.
7
48. Note that, in this case, we focused on trend analysis using only
Year as the predictor. We could also
fit a model with both Employee and Year. We will consider
regression models with more than one
predictor in a future case.
JMP Features and Hints
In this case we used Fit Y by X to develop a regression model.
We used cross-hairs tool to explore
the predicted value of the response at a given value of the
predictor. Several options, such as saving
predicted values and showing prediction intervals, are available
under the red triangle for the fitted
line. When the prediction formula is saved, a new column with
the regression formula is created.
Enter the value of X in a new row in the JMP data table, and the
predicted value will display. To save
prediction intervals to the data table for the value of X, use Fit
Model.
Note that other intervals and model diagnostics are also
available from both Fit Y by X and Fit Model.
To generate residual plots from within Fit Y by X, select the
option under the red triangle next to
Linear Fit.
Exercises
A regression trend analysis uses only the information contained
in the passage of time to predict a
response variable.
1. Perform a trend analysis with the Colorado Combined
49. Campaign data, using Actual as the
response variable and Year as the predictor.
2. Forecast the 2010 - 2013 Colorado Combined Campaign
contributions.
3. Compare your forecast for 2010 with that obtained from the
simple linear regression model in
which number of eligible employees is the predictor variable.
Hint: Compare RMSE, RSquare,
and the estimated contributions for 2010. Which model does a
better job of explaining variation in
contributions?
4. We’ve limited our analyses to one predictor variable at a
time. Guestimate what would happen, in
terms of RMSE, RSquare and model predictions if we were to
build a model with both Year and
Employees.
8
SAS Institute Inc. World Headquarters +1
919 677 8000
JMP is a software solution from SAS. To learn more about
SAS, visit www.sas.com
For JMP sales in the US and Canada, call 877 594 6567 or go to
50. www.jmp.com
SAS and all other SAS Institute Inc. product or service names
are registered trademarks or trademarks of SAS Institute Inc. in
the USA and other countries.
® indicates USA registration. Other brand and product names
are trademarks of their respective companies. S81971.1111