SlideShare a Scribd company logo
Quick Regression Review
Mini-case Studies
Vishal Singh
NYU-Stern
What we should know
 Basics of the software
 Know the difference between a continuous & discrete
(nominal) variable
 Know how to summarize a continuous (e.g. mean income) and nominal
(e.g. % Female)
 Relationship between 2 variables
 Both continuous (correlation)
 Both Nominal (cross-tab, mosaic plot)
 One Continuous & one Nominal (e.g. take Mean of continuous
variable by Nominal)
 Understand p-value: Only time we are interested in
‘statistical’ test is when doing controlled experiments
Why do need models?
o Graphs are useful for understanding but don’t
scale (when we have too many potential
predictors).
o We want to automate the analysis
o Which Ad to display?
o How to provide an insurance quote based on the
information provided by a new customer
o Conduct ‘what-if’ analysis for planning Black Friday.
Example
Predicting auto insurance
Traditional measures
Usage based
GPS device
Forecasting sales for Sony digital camera at Best
Buy
Build a demand model based on historical data from
1000 stores
Regression: Key Points
Regression: widely used research tool
 Determine whether the independent variables explain a significant
variation in the dependent variable: whether a relationship exists.
 Determine how much of the variation in the dependent variable can
be explained by the independent variables: strength of the
relationship.
 Control for other independent variables when evaluating the
contributions of a specific variable or set of variables. Marginal effect
 Forecast/Predict the values of the dependent variable.
 Use regression results as inputs to additional computations:
Optimal pricing, promotion, time to launch a product….
Exercise 1:
Box Office Revenue Prediction (see JMP file Box Office)
D3M
Box Office Prediction
Suppose you are helping Warner Bros. in
developing a model for forecasting Box Office
revenues for their new movie The Watchman. In
the file “BoxOffice.csv” you are provided the
opening week revenues (in millions of $) for
various past movies along with several predictor
variables:
Variable Description of the Variable
Opening_Week_Revenue Opening Week Revenue in millions of $
# of Theaters Number of movie theaters each movie was initially released
Overall Rating Critic ratings for each movie (high number implies more favorable ratings)
Genre 1 for Action, 2 for Comedy, 3 for Kids, and 4 Other
Data
Movie Opening_Week_Revenue Num_Theaters Overall_Rating Genre
The Dark Knight 158.4 4366 82 1
Iron Man 98.6 4105 79 1
Indiana Jones and the Kingdom of the Crystal Skull 100.1 4260 65 1
Hancock 62.6 3965 49 1
Quantum of Solace 67.5 3451 58 1
The Incredible Hulk 55.4 3505 61 1
Wanted 50.9 3175 64 1
Get Smart 38.7 3911 54 1
The Mummy: Tomb of the Dragon Emperor 40.5 3760 31 1
Journey to the Center of the Earth 21 2811 57 1
Eagle Eye 29.2 3510 43 1
10,000 B.C. 35.9 3410 34 1
Valkyrie 21 2711 56 1
Jumper 27.4 3428 35 1
Cloverfield 40.1 3411 64 1
The Day the Earth Stood Still (2008) 30.5 3560 40 1
Hellboy II: The Golden Army 34.5 3204 78 1
Spider-Man 3 151.1 4252 59 1
Transformers 70.5 4011 61 1
Pirates of the Caribbean: At World's End 114.7 4362 50 1
Objective
 Develop a regression model for “Opening
week Revenues” and all other variables as
predictors. Interpret your parameters.
 Prediction: The attributes for the movie
“Watchman” are as follows:
– Theaters= 3611, Rating= 57, Action= 1
– Given this information, what are the predicted
first week revenues for the new movie
Watchman?
Bivariate Relationship with Predictors
Bivariate Relationship with Predictors
Developing a Regression Model
D3M
Regression: Forecasting Box-office Revenues
 You need to convert the “Genre” variable into a series of dummy variables. This
is a nominal variable (i.e. categories such as 1=Action, 2=Comedy..). Adding this
variable directly into regression does not teach us anything. For example, our
coding could have been 1=Comedy, 2=Action...).
 In addition, note that total number of dummy variables we include/need is 1
less than the number of categories. The left out category is absorbed in the
intercept.
 It does not matter what you leave out—all included dummy variables will be
interpreted with respect to what you leave out.
 For example, suppose we leave out “Action” and include dummy variables for
“comedy”, “kids” and “other”. The output of this regression:
Regression with Genre Dummy Variables Only
We left out “Action” as the
base. Compare the Intercept &
Average for Action
Just looking at the means, we
see that “Kids” movies generate
(56.66 - 45.10 = 11.56) less
than action. This is the
coefficient for ‘kids’ in the
regression.
Output from JMP
Note: In JMP output, go to red triangle and then select Estimates- Indicator Function
Parameterization to get “dummy” variable output
JMP Output
What is the interpretation of
Action here?
Leave out Comedy this time
We left out “Comedy” this itme
which is the intercept now.
See that Action is 24.68 More
than Comedy. Compare this to
the -24.68 coefficient on
Comedy in the previous
regression
Obviously none of the model fit
change. The coefficients get
adjusted based on the left out
category (Comedy in this case)
Add All Predictors
• Regression is OWR (dependent variable) & #of Theaters,
Ratings, Genre as predictors
# of Theaters: Each additional point in overall
rating increases OWR by $.278mn
Overall_Rating: Each additional point in overall
rating increases OWR by $.278mn.
Genre (Kids): Compared to “Other”, kids
movies generate 17.53 less in OWR after
controlling for the effect of # of Theaters and
Ratings
Objective
 Develop a regression model for “Opening
week Revenues” and all other variables as
predictors. Interpret your parameters.
 Prediction: The attributes for the movie
“Watchman” are as follows:
– Theaters= 3611, Rating= 57, Action= 1
– Given this information, what are the predicted
first week revenues for the new movie
Watchman?
Exercise 2: Impact of Southwest
Context
Southwest & the Wright Amendment
Click on article or
google “Southwest
Wright Amendment”
to get context
Impact of Southwest Airlines on Price
Suppose you are representing Southwest and want to claim that
presence of SW in a market is good for consumers-- because it lowers
the fares.
For analysis, you are provided data on Fares from approximately 600
“city-pairs” with following variables:
 Objective: Analyze the impact of Southwest
presence on the average fares
Snapshot of Data
Start with Distributions
Anything Unusual?
Compare Mean Fare by SW
NOTE: If you square the t-ratio 6.71:
(6.71* 6.71) you get 45.03 (F-ratio)
Basic intuition of Regression Based Models
o Conceptually, fares do not just depend on presence of
Southwest
o Other factors
o In our example: Competition, Distance
o Analyze relationship b/w these variables & Fares
o In analyzing output with single predictors, note the
correspondence between regression output vs. ANOVA (t-
test)
o We get the same output from regression as a t-test or ANOVA
o More important point is to understand the workings of a
“dummy” variable in regression
Know how to
interpret these
What happens when
we treat “# of other
airlines” as nominal
vs. continuous
variable
Conceptual & Practical Tip
“Recoding Variable”
 Collapse # of other airlines from 6 categories to 4.
 Arbitrary based on distribution of data
Framing this as a Regression Problem
Regression of Fares on
Southwest. Understand how
Dummy variable is coded
Understand Output
Rsquare: Of the
total variation in
Fares, 41.6% is
explained by our
model
Distance is the most
important predictor
& Southwest is least
important
Interpretation Of Coefficients
Southwest: After Controlling for Distance and Competition (#of airlines),
absence of Southwest in the market increases fares by approximately $49.
Distance: Increasing distance by 100 miles, increases the fare by $ 21.5
# of Airline: Increasing the number of airline serving the markets by 1, reduces
the fare by approximately $41.
• Least Squares Principle: Choose β’s so that the sum of the
squared prediction errors,
is a small as possible.
Ok, but what does that mean? Open the file SSQ_Intuition.xls
2
m3m2
1
m10m )SF()( CompDistWareSSQ
M
m
  
How does the software Compute the parameters?
Average Fare by # of Airlines
Split by Presence of Southwest (Interactions—for later)
Conclusion
 T-test and ANOVA are
both used to compare
means across different
groups
 T-test for 2 groups and
ANOVA for many
groups
 We can always convert
the question to a
regression problem
using dummy variables
 Advantage of
regression is that it is
straightforward to
control for any number
of other variables that
might impact the
outcome
 From now on, we will
focus on regression
analysis
Regression: Key Points
Regression: widely used research tool
• Determine whether the independent variables explain a significant
variation in the dependent variable: whether a relationship exists.
• Determine how much of the variation in the dependent variable can
be explained by the independent variables: strength of the
relationship.
• Control for other independent variables when evaluating the
contributions of a specific variable or set of variables. Marginal effect
• Forecast/Predict the values of the dependent variable.
• Use regression results as inputs to additional computations:
Optimal pricing, promotion, time to launch a product….

More Related Content

What's hot

Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modeling
Esteban Ribero
 
Module5.slp
Module5.slpModule5.slp
Module5.slpGimylin
 
Module5.slp
Module5.slpModule5.slp
Module5.slpGimylin
 
Xue paper-01-13-12
Xue paper-01-13-12Xue paper-01-13-12
Xue paper-01-13-12
Yuhong Xue
 
Over Priced Listings
Over Priced ListingsOver Priced Listings
Over Priced ListingsKent Lardner
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSS
Osama Yousaf
 
Value investing and emerging markets
Value investing and emerging marketsValue investing and emerging markets
Value investing and emerging markets
Navneet Randhawa
 
Feature selection
Feature selectionFeature selection
Feature selection
Abhinav Katoch
 
WeikaiLi_Publication
WeikaiLi_PublicationWeikaiLi_Publication
WeikaiLi_PublicationWeikai Li
 
Capm theory portfolio management
Capm theory   portfolio managementCapm theory   portfolio management
Capm theory portfolio management
Bhaskar T
 
Mevsys Data Mining: Knowledge Discovery.
Mevsys Data Mining: Knowledge Discovery.Mevsys Data Mining: Knowledge Discovery.
Mevsys Data Mining: Knowledge Discovery.
Mevsys Data Mining
 
Black_JPM93_Beta_And_return
Black_JPM93_Beta_And_returnBlack_JPM93_Beta_And_return
Black_JPM93_Beta_And_returnRussell Abrams
 
Expected value return & standard deviation
Expected value return & standard deviationExpected value return & standard deviation
Expected value return & standard deviation
Jahanzeb Memon
 
Quantifying an association to predict future events chapt
Quantifying an association to predict future events chaptQuantifying an association to predict future events chapt
Quantifying an association to predict future events chapt
MARK547399
 
The X Factor
The X FactorThe X Factor
The X Factor
yamanote
 

What's hot (19)

Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modeling
 
Module5.slp
Module5.slpModule5.slp
Module5.slp
 
Module5.slp
Module5.slpModule5.slp
Module5.slp
 
Xue paper-01-13-12
Xue paper-01-13-12Xue paper-01-13-12
Xue paper-01-13-12
 
Assignment
AssignmentAssignment
Assignment
 
Over Priced Listings
Over Priced ListingsOver Priced Listings
Over Priced Listings
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSS
 
Value investing and emerging markets
Value investing and emerging marketsValue investing and emerging markets
Value investing and emerging markets
 
Feature selection
Feature selectionFeature selection
Feature selection
 
WeikaiLi_Publication
WeikaiLi_PublicationWeikaiLi_Publication
WeikaiLi_Publication
 
Capm theory portfolio management
Capm theory   portfolio managementCapm theory   portfolio management
Capm theory portfolio management
 
Demand Estimation
Demand EstimationDemand Estimation
Demand Estimation
 
Mevsys Data Mining: Knowledge Discovery.
Mevsys Data Mining: Knowledge Discovery.Mevsys Data Mining: Knowledge Discovery.
Mevsys Data Mining: Knowledge Discovery.
 
Black_JPM93_Beta_And_return
Black_JPM93_Beta_And_returnBlack_JPM93_Beta_And_return
Black_JPM93_Beta_And_return
 
Demand forcasting
Demand forcastingDemand forcasting
Demand forcasting
 
Expected value return & standard deviation
Expected value return & standard deviationExpected value return & standard deviation
Expected value return & standard deviation
 
Quantifying an association to predict future events chapt
Quantifying an association to predict future events chaptQuantifying an association to predict future events chapt
Quantifying an association to predict future events chapt
 
The X Factor
The X FactorThe X Factor
The X Factor
 
muthu.shree
muthu.shreemuthu.shree
muthu.shree
 

Similar to Regressioin mini case

Week14_Business Simulation Modeling MSBA.pptx
Week14_Business Simulation Modeling MSBA.pptxWeek14_Business Simulation Modeling MSBA.pptx
Week14_Business Simulation Modeling MSBA.pptx
Usamamalik345378
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
James by CrowdProcess
 
Linear_Regression
Linear_RegressionLinear_Regression
Linear_Regression
Mohamed Essam
 
Decision theory
Decision theoryDecision theory
Decision theory
Aditya Mahagaonkar
 
Marketing Research Approaches .docx
Marketing Research Approaches .docxMarketing Research Approaches .docx
Marketing Research Approaches .docx
alfredacavx97
 
UNIT - I Reinforcement Learning .pptx
UNIT - I Reinforcement Learning .pptxUNIT - I Reinforcement Learning .pptx
UNIT - I Reinforcement Learning .pptx
DrUdayKiranG
 
Chapter 04
Chapter 04 Chapter 04
Chapter 04
Tuul Tuul
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Risk Concept And Management 5
Risk Concept And Management 5Risk Concept And Management 5
Risk Concept And Management 5
rajeevgupta
 
Brown bag 2012_fall
Brown bag 2012_fallBrown bag 2012_fall
Brown bag 2012_fallXiaolei Zhou
 
Faster and cheaper, smart ab experiments - public ver.
Faster and cheaper, smart ab experiments - public ver.Faster and cheaper, smart ab experiments - public ver.
Faster and cheaper, smart ab experiments - public ver.
Marsan Ma
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
ijaia
 
1 chapter 04
1 chapter 041 chapter 04
1 chapter 04
NELSON DUBE
 
Predicting Movie Success Using Neural Network
Predicting Movie Success Using Neural NetworkPredicting Movie Success Using Neural Network
Predicting Movie Success Using Neural Network
International Journal of Science and Research (IJSR)
 
Profit Maximization over Social Networks
Profit Maximization over Social NetworksProfit Maximization over Social Networks
Profit Maximization over Social Networks
Wei Lu
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous Optimization
ScientificRevenue
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
Manojit Nandi
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
Sridhar Ratakonda
 
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM: A R...
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM: A R...NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM: A R...
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM: A R...
IRJET Journal
 

Similar to Regressioin mini case (20)

Week14_Business Simulation Modeling MSBA.pptx
Week14_Business Simulation Modeling MSBA.pptxWeek14_Business Simulation Modeling MSBA.pptx
Week14_Business Simulation Modeling MSBA.pptx
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
Linear_Regression
Linear_RegressionLinear_Regression
Linear_Regression
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Marketing Research Approaches .docx
Marketing Research Approaches .docxMarketing Research Approaches .docx
Marketing Research Approaches .docx
 
UNIT - I Reinforcement Learning .pptx
UNIT - I Reinforcement Learning .pptxUNIT - I Reinforcement Learning .pptx
UNIT - I Reinforcement Learning .pptx
 
Chapter 04
Chapter 04 Chapter 04
Chapter 04
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Risk Concept And Management 5
Risk Concept And Management 5Risk Concept And Management 5
Risk Concept And Management 5
 
Brown bag 2012_fall
Brown bag 2012_fallBrown bag 2012_fall
Brown bag 2012_fall
 
Faster and cheaper, smart ab experiments - public ver.
Faster and cheaper, smart ab experiments - public ver.Faster and cheaper, smart ab experiments - public ver.
Faster and cheaper, smart ab experiments - public ver.
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
1 chapter 04
1 chapter 041 chapter 04
1 chapter 04
 
Predicting Movie Success Using Neural Network
Predicting Movie Success Using Neural NetworkPredicting Movie Success Using Neural Network
Predicting Movie Success Using Neural Network
 
Profit Maximization over Social Networks
Profit Maximization over Social NetworksProfit Maximization over Social Networks
Profit Maximization over Social Networks
 
Pro max icdm2012-slides
Pro max icdm2012-slidesPro max icdm2012-slides
Pro max icdm2012-slides
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous Optimization
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM: A R...
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM: A R...NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM: A R...
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM: A R...
 

More from veesingh

Slalom
SlalomSlalom
Slalom
veesingh
 
Identification1
Identification1Identification1
Identification1
veesingh
 
Brand Asset Case Study
Brand Asset Case StudyBrand Asset Case Study
Brand Asset Case Study
veesingh
 
Fat Tax Slideshow
Fat Tax SlideshowFat Tax Slideshow
Fat Tax Slideshow
veesingh
 
Correlation causality
Correlation causalityCorrelation causality
Correlation causality
veesingh
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningveesingh
 
Obesity
ObesityObesity
Obesity
veesingh
 
Field experiments
Field experimentsField experiments
Field experiments
veesingh
 
Brand mining
Brand miningBrand mining
Brand mining
veesingh
 
D3M Commodity
D3M Commodity D3M Commodity
D3M Commodity
veesingh
 
D3M Online Reviews
D3M Online ReviewsD3M Online Reviews
D3M Online Reviews
veesingh
 
D3M Politics
D3M PoliticsD3M Politics
D3M Politics
veesingh
 

More from veesingh (12)

Slalom
SlalomSlalom
Slalom
 
Identification1
Identification1Identification1
Identification1
 
Brand Asset Case Study
Brand Asset Case StudyBrand Asset Case Study
Brand Asset Case Study
 
Fat Tax Slideshow
Fat Tax SlideshowFat Tax Slideshow
Fat Tax Slideshow
 
Correlation causality
Correlation causalityCorrelation causality
Correlation causality
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Obesity
ObesityObesity
Obesity
 
Field experiments
Field experimentsField experiments
Field experiments
 
Brand mining
Brand miningBrand mining
Brand mining
 
D3M Commodity
D3M Commodity D3M Commodity
D3M Commodity
 
D3M Online Reviews
D3M Online ReviewsD3M Online Reviews
D3M Online Reviews
 
D3M Politics
D3M PoliticsD3M Politics
D3M Politics
 

Recently uploaded

Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
SynapseIndia
 
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Boris Ziegler
 
amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05
marketing317746
 
Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...
dylandmeas
 
Improving profitability for small business
Improving profitability for small businessImproving profitability for small business
Improving profitability for small business
Ben Wann
 
The effects of customers service quality and online reviews on customer loyal...
The effects of customers service quality and online reviews on customer loyal...The effects of customers service quality and online reviews on customer loyal...
The effects of customers service quality and online reviews on customer loyal...
balatucanapplelovely
 
Authentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto RicoAuthentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto Rico
Corey Perlman, Social Media Speaker and Consultant
 
Recruiting in the Digital Age: A Social Media Masterclass
Recruiting in the Digital Age: A Social Media MasterclassRecruiting in the Digital Age: A Social Media Masterclass
Recruiting in the Digital Age: A Social Media Masterclass
LuanWise
 
Business Valuation Principles for Entrepreneurs
Business Valuation Principles for EntrepreneursBusiness Valuation Principles for Entrepreneurs
Business Valuation Principles for Entrepreneurs
Ben Wann
 
Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111
zoyaansari11365
 
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdfikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
agatadrynko
 
Cracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptxCracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptx
Workforce Group
 
Digital Transformation and IT Strategy Toolkit and Templates
Digital Transformation and IT Strategy Toolkit and TemplatesDigital Transformation and IT Strategy Toolkit and Templates
Digital Transformation and IT Strategy Toolkit and Templates
Aurelien Domont, MBA
 
Training my puppy and implementation in this story
Training my puppy and implementation in this storyTraining my puppy and implementation in this story
Training my puppy and implementation in this story
WilliamRodrigues148
 
Building Your Employer Brand with Social Media
Building Your Employer Brand with Social MediaBuilding Your Employer Brand with Social Media
Building Your Employer Brand with Social Media
LuanWise
 
In the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptxIn the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptx
Adani case
 
Buy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star ReviewsBuy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star Reviews
usawebmarket
 
The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...
Adam Smith
 
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Lviv Startup Club
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
Norma Mushkat Gaffin
 

Recently uploaded (20)

Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
 
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
 
amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05
 
Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...
 
Improving profitability for small business
Improving profitability for small businessImproving profitability for small business
Improving profitability for small business
 
The effects of customers service quality and online reviews on customer loyal...
The effects of customers service quality and online reviews on customer loyal...The effects of customers service quality and online reviews on customer loyal...
The effects of customers service quality and online reviews on customer loyal...
 
Authentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto RicoAuthentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto Rico
 
Recruiting in the Digital Age: A Social Media Masterclass
Recruiting in the Digital Age: A Social Media MasterclassRecruiting in the Digital Age: A Social Media Masterclass
Recruiting in the Digital Age: A Social Media Masterclass
 
Business Valuation Principles for Entrepreneurs
Business Valuation Principles for EntrepreneursBusiness Valuation Principles for Entrepreneurs
Business Valuation Principles for Entrepreneurs
 
Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111
 
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdfikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
 
Cracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptxCracking the Workplace Discipline Code Main.pptx
Cracking the Workplace Discipline Code Main.pptx
 
Digital Transformation and IT Strategy Toolkit and Templates
Digital Transformation and IT Strategy Toolkit and TemplatesDigital Transformation and IT Strategy Toolkit and Templates
Digital Transformation and IT Strategy Toolkit and Templates
 
Training my puppy and implementation in this story
Training my puppy and implementation in this storyTraining my puppy and implementation in this story
Training my puppy and implementation in this story
 
Building Your Employer Brand with Social Media
Building Your Employer Brand with Social MediaBuilding Your Employer Brand with Social Media
Building Your Employer Brand with Social Media
 
In the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptxIn the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptx
 
Buy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star ReviewsBuy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star Reviews
 
The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...
 
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
 

Regressioin mini case

  • 1. Quick Regression Review Mini-case Studies Vishal Singh NYU-Stern
  • 2. What we should know  Basics of the software  Know the difference between a continuous & discrete (nominal) variable  Know how to summarize a continuous (e.g. mean income) and nominal (e.g. % Female)  Relationship between 2 variables  Both continuous (correlation)  Both Nominal (cross-tab, mosaic plot)  One Continuous & one Nominal (e.g. take Mean of continuous variable by Nominal)  Understand p-value: Only time we are interested in ‘statistical’ test is when doing controlled experiments
  • 3. Why do need models? o Graphs are useful for understanding but don’t scale (when we have too many potential predictors). o We want to automate the analysis o Which Ad to display? o How to provide an insurance quote based on the information provided by a new customer o Conduct ‘what-if’ analysis for planning Black Friday.
  • 4. Example Predicting auto insurance Traditional measures Usage based GPS device Forecasting sales for Sony digital camera at Best Buy Build a demand model based on historical data from 1000 stores
  • 5. Regression: Key Points Regression: widely used research tool  Determine whether the independent variables explain a significant variation in the dependent variable: whether a relationship exists.  Determine how much of the variation in the dependent variable can be explained by the independent variables: strength of the relationship.  Control for other independent variables when evaluating the contributions of a specific variable or set of variables. Marginal effect  Forecast/Predict the values of the dependent variable.  Use regression results as inputs to additional computations: Optimal pricing, promotion, time to launch a product….
  • 6. Exercise 1: Box Office Revenue Prediction (see JMP file Box Office) D3M
  • 7.
  • 8. Box Office Prediction Suppose you are helping Warner Bros. in developing a model for forecasting Box Office revenues for their new movie The Watchman. In the file “BoxOffice.csv” you are provided the opening week revenues (in millions of $) for various past movies along with several predictor variables: Variable Description of the Variable Opening_Week_Revenue Opening Week Revenue in millions of $ # of Theaters Number of movie theaters each movie was initially released Overall Rating Critic ratings for each movie (high number implies more favorable ratings) Genre 1 for Action, 2 for Comedy, 3 for Kids, and 4 Other
  • 9. Data Movie Opening_Week_Revenue Num_Theaters Overall_Rating Genre The Dark Knight 158.4 4366 82 1 Iron Man 98.6 4105 79 1 Indiana Jones and the Kingdom of the Crystal Skull 100.1 4260 65 1 Hancock 62.6 3965 49 1 Quantum of Solace 67.5 3451 58 1 The Incredible Hulk 55.4 3505 61 1 Wanted 50.9 3175 64 1 Get Smart 38.7 3911 54 1 The Mummy: Tomb of the Dragon Emperor 40.5 3760 31 1 Journey to the Center of the Earth 21 2811 57 1 Eagle Eye 29.2 3510 43 1 10,000 B.C. 35.9 3410 34 1 Valkyrie 21 2711 56 1 Jumper 27.4 3428 35 1 Cloverfield 40.1 3411 64 1 The Day the Earth Stood Still (2008) 30.5 3560 40 1 Hellboy II: The Golden Army 34.5 3204 78 1 Spider-Man 3 151.1 4252 59 1 Transformers 70.5 4011 61 1 Pirates of the Caribbean: At World's End 114.7 4362 50 1
  • 10. Objective  Develop a regression model for “Opening week Revenues” and all other variables as predictors. Interpret your parameters.  Prediction: The attributes for the movie “Watchman” are as follows: – Theaters= 3611, Rating= 57, Action= 1 – Given this information, what are the predicted first week revenues for the new movie Watchman?
  • 14. Regression: Forecasting Box-office Revenues  You need to convert the “Genre” variable into a series of dummy variables. This is a nominal variable (i.e. categories such as 1=Action, 2=Comedy..). Adding this variable directly into regression does not teach us anything. For example, our coding could have been 1=Comedy, 2=Action...).  In addition, note that total number of dummy variables we include/need is 1 less than the number of categories. The left out category is absorbed in the intercept.  It does not matter what you leave out—all included dummy variables will be interpreted with respect to what you leave out.  For example, suppose we leave out “Action” and include dummy variables for “comedy”, “kids” and “other”. The output of this regression:
  • 15. Regression with Genre Dummy Variables Only We left out “Action” as the base. Compare the Intercept & Average for Action Just looking at the means, we see that “Kids” movies generate (56.66 - 45.10 = 11.56) less than action. This is the coefficient for ‘kids’ in the regression.
  • 16. Output from JMP Note: In JMP output, go to red triangle and then select Estimates- Indicator Function Parameterization to get “dummy” variable output JMP Output What is the interpretation of Action here?
  • 17. Leave out Comedy this time We left out “Comedy” this itme which is the intercept now. See that Action is 24.68 More than Comedy. Compare this to the -24.68 coefficient on Comedy in the previous regression Obviously none of the model fit change. The coefficients get adjusted based on the left out category (Comedy in this case)
  • 18. Add All Predictors • Regression is OWR (dependent variable) & #of Theaters, Ratings, Genre as predictors # of Theaters: Each additional point in overall rating increases OWR by $.278mn Overall_Rating: Each additional point in overall rating increases OWR by $.278mn. Genre (Kids): Compared to “Other”, kids movies generate 17.53 less in OWR after controlling for the effect of # of Theaters and Ratings
  • 19. Objective  Develop a regression model for “Opening week Revenues” and all other variables as predictors. Interpret your parameters.  Prediction: The attributes for the movie “Watchman” are as follows: – Theaters= 3611, Rating= 57, Action= 1 – Given this information, what are the predicted first week revenues for the new movie Watchman?
  • 20. Exercise 2: Impact of Southwest
  • 21. Context Southwest & the Wright Amendment Click on article or google “Southwest Wright Amendment” to get context
  • 22. Impact of Southwest Airlines on Price Suppose you are representing Southwest and want to claim that presence of SW in a market is good for consumers-- because it lowers the fares. For analysis, you are provided data on Fares from approximately 600 “city-pairs” with following variables:  Objective: Analyze the impact of Southwest presence on the average fares
  • 25. Compare Mean Fare by SW NOTE: If you square the t-ratio 6.71: (6.71* 6.71) you get 45.03 (F-ratio)
  • 26. Basic intuition of Regression Based Models o Conceptually, fares do not just depend on presence of Southwest o Other factors o In our example: Competition, Distance o Analyze relationship b/w these variables & Fares o In analyzing output with single predictors, note the correspondence between regression output vs. ANOVA (t- test) o We get the same output from regression as a t-test or ANOVA o More important point is to understand the workings of a “dummy” variable in regression
  • 28. What happens when we treat “# of other airlines” as nominal vs. continuous variable
  • 29. Conceptual & Practical Tip “Recoding Variable”  Collapse # of other airlines from 6 categories to 4.  Arbitrary based on distribution of data
  • 30. Framing this as a Regression Problem
  • 31. Regression of Fares on Southwest. Understand how Dummy variable is coded
  • 32. Understand Output Rsquare: Of the total variation in Fares, 41.6% is explained by our model Distance is the most important predictor & Southwest is least important
  • 33. Interpretation Of Coefficients Southwest: After Controlling for Distance and Competition (#of airlines), absence of Southwest in the market increases fares by approximately $49. Distance: Increasing distance by 100 miles, increases the fare by $ 21.5 # of Airline: Increasing the number of airline serving the markets by 1, reduces the fare by approximately $41.
  • 34. • Least Squares Principle: Choose β’s so that the sum of the squared prediction errors, is a small as possible. Ok, but what does that mean? Open the file SSQ_Intuition.xls 2 m3m2 1 m10m )SF()( CompDistWareSSQ M m    How does the software Compute the parameters?
  • 35. Average Fare by # of Airlines Split by Presence of Southwest (Interactions—for later)
  • 36. Conclusion  T-test and ANOVA are both used to compare means across different groups  T-test for 2 groups and ANOVA for many groups  We can always convert the question to a regression problem using dummy variables  Advantage of regression is that it is straightforward to control for any number of other variables that might impact the outcome  From now on, we will focus on regression analysis
  • 37. Regression: Key Points Regression: widely used research tool • Determine whether the independent variables explain a significant variation in the dependent variable: whether a relationship exists. • Determine how much of the variation in the dependent variable can be explained by the independent variables: strength of the relationship. • Control for other independent variables when evaluating the contributions of a specific variable or set of variables. Marginal effect • Forecast/Predict the values of the dependent variable. • Use regression results as inputs to additional computations: Optimal pricing, promotion, time to launch a product….

Editor's Notes

  1. 5
  2. 8
  3. 9
  4. 10
  5. 19
  6. 37