SlideShare a Scribd company logo
Intermediate
Regression Topics
  Daniel Gerlanc, Director
    Enplus Advisors Inc
Topics


Abalone Data

Variable Transformation

Simulation for Predictive Inference
http://archive.ics.uci.edu/ml/datasets/Abalone




                   Abalone
Loading the data
>   abalone.path = "~/data/abalone.csv"
>   abalone.cols = c("sex", "length", "diameter", "height", "whole.wt",
+                    "shucked.wt", "viscera.wt", "shell.wt", "rings")
>
>   abalone <- read.csv(abalone.path, sep=",", row.names=NULL,
+                       col.names=abalone.cols)
>   str(abalone)

'data.frame':!
             4177 obs. of 9 variables:
 $ sex       : chr "M" "M" "F" "M" ...
 $ length    : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
 $ diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
 $ height    : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
 $ whole.wt : num 0.514 0.226 0.677 0.516 0.205 ...
 $ shucked.wt: num 0.2245 0.0995 0.2565 0.2155 0.0895 ...
 $ viscera.wt: num 0.101 0.0485 0.1415 0.114 0.0395 ...
 $ shell.wt : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
 $ rings     : int 15 7 9 10 7 8 20 16 9 19 ...
Uses lattice graphics




             Draw pictures
Lattice Plots
> xyplot(jitter(rings) ~ shell.wt | sex, abalone, grid=T, pch=".",
       subset=volume < 0.2,
       panel=function(x, y, ...) {
          panel.lmline(x, y, ...)
          panel.xyplot(x, y, ...)
       },
       ylab="rings")


ggplot2 is a newer package that can be used to create similar plots.
Infant    Adult




   Combine groups
Why Transform?


Interpretability

Additive vs. Multiplicative Form

Prediction
Simple Model
> fit.1 <- lm(rings ~ sex + shell.wt, abalone)

> summary(fit.1)

Call:
lm(formula = rings ~ sex + shell.wt, data = abalone)

Residuals:
   Min     1Q Median      3Q    Max
-5.750 -1.592 -0.535   0.886 15.736

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    6.2423    0.0799   78.08   <2e-16 ***
sex            0.9142    0.0984    9.29   <2e-16 ***
shell.wt      12.8581    0.3300   38.96   <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.5 on 4174 degrees of freedom
Centering with z-scores


 Subtract the mean from each input and
 divide by 1 or 2 standard deviations

 Dummy/Proxy variables may be centered as
 well
Center Values
> abalone.adj <- abalone[, c(outcome, predictors)]
for (i in predictors) {
  abalone.adj[[i]] <-
    (abalone.adj[[i]] - mean(abalone.adj[[i]])) / (2 * sd(abalone.adj[[i]]))
}

Also look into the ‘scale’ function
Why center?


Interpret coefficients in terms of standard
deviations

Gives a sense of variable importance
Interpretability
> fit.1a <- lm(rings ~ sex + shell.wt, abalone.adj)

> summary(fit.1a)

Call:
lm(formula = rings ~ sex + shell.wt, data = abalone.adj)

Residuals:
   Min     1Q Median      3Q    Max
-5.750 -1.592 -0.535   0.886 15.736

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   9.9337     0.0385 258.33    <2e-16 ***
sex           0.8539     0.0919    9.29   <2e-16 ***
shell.wt      3.5798     0.0919   38.96   <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.5 on 4174 degrees of freedom
Multiple R-squared: 0.406,!
                          Adjusted R-squared: 0.406
F-statistic: 1.43e+03 on 2 and 4174 DF, p-value: <2e-16
Two Models
    lm(formula = rings ~ sex + shell.wt, data = abalone)
                coef.est coef.se
    (Intercept) 6.24      0.08
    sex          0.91     0.10
    shell.wt    12.86     0.33
    ---
    n = 4177, k = 3
    residual sd = 2.49, R-Squared = 0.41



    lm(formula = rings ~ sex + shell.wt, data = abalone.adj)
                coef.est coef.se
    (Intercept) 9.93     0.04
    sex         0.85     0.09
    shell.wt    3.58     0.09
    ---
    n = 4177, k = 3
    residual sd = 2.49, R-Squared = 0.41



Smaller difference in SD terms
Why divide by 2 SDs
So binary variables may be interpreted
similarly to continuous variables

e.g., Binary Value of 0, 1 occurring with equal
frequency has an sd of 0.5.
sqrt(0.5 * (1 - 0.5)) = 0.5

(1 - 0.5) / (2 * 0.5) = 0.5    (1 - 0.5) / (2 * 0.5) = +1

(0 - 0.5) / (2 * 0.5) = -0.5   (0 - 0.5) / (2 * 0.5) = -1

-0.5 --> +0.5                  -1 --> +1
                   Diff of 1                  Diff of 2
Prediction
Simulation
Allow for more general inferences

Propagation of uncertainty
Prediction Errors
90% Percentile Adult vs. 50% Infant
    fit.4   <- lm(log(rings) ~ sex + log(shell.wt), abalone)

    large.abalone <- log(quantile(subset(abalone, sex == 1)$shell.wt, 0.90))
    small.infant <- log(median(abalone$shell.wt[abalone$sex == 0]))
    x.a <- sum(c(1, 1, large.abalone) * coef(fit.4))
    x.i <- sum(c(1, 0, small.infant) * coef(fit.4))

    set.seed(1)
    n.sims <- 1000
    pred.a <- exp(rnorm(n.sims, x.a, sigma.hat(fit.4)))
    pred.i <- exp(rnorm(n.sims, x.i, sigma.hat(fit.4)))
    pred.diff <- pred.a - pred.i

    > mean(pred.diff)
    4.5

    > quantile(pred.diff, c(0.025, 0.975))

    2.5% 98%
    -1.9 11.3
Simulation for
      Inferential Uncertainty
 Simulate residual
standard deviation


 Simulate
Inferential Uncertainty
## Create 1000 simulations of the residual standard error and coefficients

fit.5 <- lm(log(rings) ~ sex + shell.wt + sex:shell.wt, abalone)

n.sims      <-   1000
obj         <-   summary(fit.5) # save off the summary object
sigma.hat   <-   obj$sigma
b.hat       <-   obj$coef[, 'Estimate', drop=TRUE]
cov.beta    <-   obj$cov.unscaled # extract the covariance matrix
k           <-   obj$df[1] # number of predictors
n           <-   obj$df[1] + obj$df[2] # number of observations

set.seed(1)
sigma.sim <- sigma.hat * sqrt((n-k) / rchisq(n.sims, n-k))

beta.sim <- matrix(NA_real_, n.sims, k, dimnames=list(NULL, names(beta.hat)))
for (i in seq_len(n.sims)) {
  beta.sim[i, ] <- MASS::mvrnorm(1, b.hat, sigma.sim[i]^2 * cov.beta)
}
Inferential Uncertainty

More Related Content

What's hot

Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
Nixon Mendez
 
Multi dof modal analysis free
Multi dof modal analysis freeMulti dof modal analysis free
Multi dof modal analysis free
MahdiKarimi29
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
Peter Solymos
 
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Okamoto Laboratory, The University of Electro-Communications
 
Programação funcional em Python
Programação funcional em PythonProgramação funcional em Python
Programação funcional em Python
Juarez da Silva Bochi
 
The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180
Mahmoud Samir Fayed
 
Manual "The meuse data set"
Manual "The meuse data set"Manual "The meuse data set"
Manual "The meuse data set"
MauricioTics2016
 
The Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonThe Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint Lesson
Paul Hawks
 
Hanya contoh saja dari xampp
Hanya contoh saja dari xamppHanya contoh saja dari xampp
Hanya contoh saja dari xampp
Bina Sarana Informatika
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
Max Kleiner
 

What's hot (13)

Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
Multi dof modal analysis free
Multi dof modal analysis freeMulti dof modal analysis free
Multi dof modal analysis free
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
 
Kursus
KursusKursus
Kursus
 
Programação funcional em Python
Programação funcional em PythonProgramação funcional em Python
Programação funcional em Python
 
08 functions
08 functions08 functions
08 functions
 
The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180
 
Manual "The meuse data set"
Manual "The meuse data set"Manual "The meuse data set"
Manual "The meuse data set"
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
The Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonThe Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint Lesson
 
Hanya contoh saja dari xampp
Hanya contoh saja dari xamppHanya contoh saja dari xampp
Hanya contoh saja dari xampp
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
 

Viewers also liked

Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisDetecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
FraudBusters
 
Babok2 Big Picture
Babok2 Big PictureBabok2 Big Picture
Babok2 Big Picture
CBAP Master
 
Using Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditUsing Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic Audit
FraudBusters
 
Go Predictive Analytics
Go Predictive AnalyticsGo Predictive Analytics
Go Predictive Analytics
Go Predictive Analytics, LLC
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis TechniquesGajanan Bochare
 
Quick Response Fraud Detection
Quick Response Fraud DetectionQuick Response Fraud Detection
Quick Response Fraud Detection
FraudBusters
 
Think Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterThink Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a Fraudster
FraudBusters
 
Using Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudUsing Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay Fraud
FraudBusters
 
Faster document review and production
Faster document review and productionFaster document review and production
Faster document review and production
Lexbe_Webinars
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
PAPIs.io
 
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions America, Ltd.
 
Azure BootCamp presentation 2016 v1.1
Azure BootCamp presentation 2016 v1.1Azure BootCamp presentation 2016 v1.1
Azure BootCamp presentation 2016 v1.1
Javier Martínez Nohalés
 
High Range Pressure Switches MD Series
High Range Pressure Switches MD SeriesHigh Range Pressure Switches MD Series
High Range Pressure Switches MD Series
NK Instruments Pvt. Ltd.
 
Contenedores Docker en SUSE: OpenExpo 2016
Contenedores Docker en SUSE: OpenExpo 2016Contenedores Docker en SUSE: OpenExpo 2016
Contenedores Docker en SUSE: OpenExpo 2016
Javier Martínez Nohalés
 
R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)
NK Instruments Pvt. Ltd.
 
Pamplet
PampletPamplet
Pamplet
caffemi
 
Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)
jaquelinne yoanna ruiz achury
 

Viewers also liked (20)

Simplifying stats
Simplifying  statsSimplifying  stats
Simplifying stats
 
ACCOUNTING & AUDITING WITH EXCEL2011
ACCOUNTING & AUDITING WITH EXCEL2011ACCOUNTING & AUDITING WITH EXCEL2011
ACCOUNTING & AUDITING WITH EXCEL2011
 
Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisDetecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
 
Babok2 Big Picture
Babok2 Big PictureBabok2 Big Picture
Babok2 Big Picture
 
Using Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditUsing Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic Audit
 
Go Predictive Analytics
Go Predictive AnalyticsGo Predictive Analytics
Go Predictive Analytics
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis Techniques
 
Quick Response Fraud Detection
Quick Response Fraud DetectionQuick Response Fraud Detection
Quick Response Fraud Detection
 
Think Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterThink Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a Fraudster
 
Using Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudUsing Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay Fraud
 
Faster document review and production
Faster document review and productionFaster document review and production
Faster document review and production
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
 
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
 
Azure BootCamp presentation 2016 v1.1
Azure BootCamp presentation 2016 v1.1Azure BootCamp presentation 2016 v1.1
Azure BootCamp presentation 2016 v1.1
 
High Range Pressure Switches MD Series
High Range Pressure Switches MD SeriesHigh Range Pressure Switches MD Series
High Range Pressure Switches MD Series
 
Contenedores Docker en SUSE: OpenExpo 2016
Contenedores Docker en SUSE: OpenExpo 2016Contenedores Docker en SUSE: OpenExpo 2016
Contenedores Docker en SUSE: OpenExpo 2016
 
Tanveer ACCA Accountant
Tanveer ACCA AccountantTanveer ACCA Accountant
Tanveer ACCA Accountant
 
R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)
 
Pamplet
PampletPamplet
Pamplet
 
Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)
 

Similar to Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics

11. Linear Models
11. Linear Models11. Linear Models
11. Linear Models
FAO
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
Rajwinder Marock
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
Dr. Volkan OBAN
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Dr. Volkan OBAN
 
Input analysis
Input analysisInput analysis
Input analysis
Bhavik A Shah
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics
nazlitemu
 
Java Performance Puzzlers
Java Performance PuzzlersJava Performance Puzzlers
Java Performance Puzzlers
Doug Hawkins
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part II
Dr. Volkan OBAN
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdf
zehiwot hone
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
aulasnilda
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
jeremylockett77
 
Assignment 5.1.pdf
Assignment 5.1.pdfAssignment 5.1.pdf
Assignment 5.1.pdf
dash41
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
Avjinder (Avi) Kaler
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
Avjinder (Avi) Kaler
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
Kevin Chun-Hsien Hsu
 
Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)asghar123456
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
Max Kleiner
 
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
ShuaiGao3
 

Similar to Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics (20)

11. Linear Models
11. Linear Models11. Linear Models
11. Linear Models
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
 
Input analysis
Input analysisInput analysis
Input analysis
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics
 
Java Performance Puzzlers
Java Performance PuzzlersJava Performance Puzzlers
Java Performance Puzzlers
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part II
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdf
 
hw4analysis
hw4analysishw4analysis
hw4analysis
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 
Assignment 5.1.pdf
Assignment 5.1.pdfAssignment 5.1.pdf
Assignment 5.1.pdf
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
 
Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
 
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 

Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics

  • 1. Intermediate Regression Topics Daniel Gerlanc, Director Enplus Advisors Inc
  • 4. Loading the data > abalone.path = "~/data/abalone.csv" > abalone.cols = c("sex", "length", "diameter", "height", "whole.wt", + "shucked.wt", "viscera.wt", "shell.wt", "rings") > > abalone <- read.csv(abalone.path, sep=",", row.names=NULL, + col.names=abalone.cols) > str(abalone) 'data.frame':! 4177 obs. of 9 variables: $ sex : chr "M" "M" "F" "M" ... $ length : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ... $ diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ... $ height : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ... $ whole.wt : num 0.514 0.226 0.677 0.516 0.205 ... $ shucked.wt: num 0.2245 0.0995 0.2565 0.2155 0.0895 ... $ viscera.wt: num 0.101 0.0485 0.1415 0.114 0.0395 ... $ shell.wt : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ... $ rings : int 15 7 9 10 7 8 20 16 9 19 ...
  • 5. Uses lattice graphics Draw pictures
  • 6. Lattice Plots > xyplot(jitter(rings) ~ shell.wt | sex, abalone, grid=T, pch=".", subset=volume < 0.2, panel=function(x, y, ...) { panel.lmline(x, y, ...) panel.xyplot(x, y, ...) }, ylab="rings") ggplot2 is a newer package that can be used to create similar plots.
  • 7. Infant Adult Combine groups
  • 8. Why Transform? Interpretability Additive vs. Multiplicative Form Prediction
  • 9. Simple Model > fit.1 <- lm(rings ~ sex + shell.wt, abalone) > summary(fit.1) Call: lm(formula = rings ~ sex + shell.wt, data = abalone) Residuals: Min 1Q Median 3Q Max -5.750 -1.592 -0.535 0.886 15.736 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.2423 0.0799 78.08 <2e-16 *** sex 0.9142 0.0984 9.29 <2e-16 *** shell.wt 12.8581 0.3300 38.96 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.5 on 4174 degrees of freedom
  • 10. Centering with z-scores Subtract the mean from each input and divide by 1 or 2 standard deviations Dummy/Proxy variables may be centered as well
  • 11. Center Values > abalone.adj <- abalone[, c(outcome, predictors)] for (i in predictors) { abalone.adj[[i]] <- (abalone.adj[[i]] - mean(abalone.adj[[i]])) / (2 * sd(abalone.adj[[i]])) } Also look into the ‘scale’ function
  • 12. Why center? Interpret coefficients in terms of standard deviations Gives a sense of variable importance
  • 13. Interpretability > fit.1a <- lm(rings ~ sex + shell.wt, abalone.adj) > summary(fit.1a) Call: lm(formula = rings ~ sex + shell.wt, data = abalone.adj) Residuals: Min 1Q Median 3Q Max -5.750 -1.592 -0.535 0.886 15.736 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.9337 0.0385 258.33 <2e-16 *** sex 0.8539 0.0919 9.29 <2e-16 *** shell.wt 3.5798 0.0919 38.96 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.5 on 4174 degrees of freedom Multiple R-squared: 0.406,! Adjusted R-squared: 0.406 F-statistic: 1.43e+03 on 2 and 4174 DF, p-value: <2e-16
  • 14. Two Models lm(formula = rings ~ sex + shell.wt, data = abalone) coef.est coef.se (Intercept) 6.24 0.08 sex 0.91 0.10 shell.wt 12.86 0.33 --- n = 4177, k = 3 residual sd = 2.49, R-Squared = 0.41 lm(formula = rings ~ sex + shell.wt, data = abalone.adj) coef.est coef.se (Intercept) 9.93 0.04 sex 0.85 0.09 shell.wt 3.58 0.09 --- n = 4177, k = 3 residual sd = 2.49, R-Squared = 0.41 Smaller difference in SD terms
  • 15. Why divide by 2 SDs So binary variables may be interpreted similarly to continuous variables e.g., Binary Value of 0, 1 occurring with equal frequency has an sd of 0.5. sqrt(0.5 * (1 - 0.5)) = 0.5 (1 - 0.5) / (2 * 0.5) = 0.5 (1 - 0.5) / (2 * 0.5) = +1 (0 - 0.5) / (2 * 0.5) = -0.5 (0 - 0.5) / (2 * 0.5) = -1 -0.5 --> +0.5 -1 --> +1 Diff of 1 Diff of 2
  • 17. Simulation Allow for more general inferences Propagation of uncertainty
  • 18. Prediction Errors 90% Percentile Adult vs. 50% Infant fit.4 <- lm(log(rings) ~ sex + log(shell.wt), abalone) large.abalone <- log(quantile(subset(abalone, sex == 1)$shell.wt, 0.90)) small.infant <- log(median(abalone$shell.wt[abalone$sex == 0])) x.a <- sum(c(1, 1, large.abalone) * coef(fit.4)) x.i <- sum(c(1, 0, small.infant) * coef(fit.4)) set.seed(1) n.sims <- 1000 pred.a <- exp(rnorm(n.sims, x.a, sigma.hat(fit.4))) pred.i <- exp(rnorm(n.sims, x.i, sigma.hat(fit.4))) pred.diff <- pred.a - pred.i > mean(pred.diff) 4.5 > quantile(pred.diff, c(0.025, 0.975)) 2.5% 98% -1.9 11.3
  • 19. Simulation for Inferential Uncertainty Simulate residual standard deviation Simulate
  • 20. Inferential Uncertainty ## Create 1000 simulations of the residual standard error and coefficients fit.5 <- lm(log(rings) ~ sex + shell.wt + sex:shell.wt, abalone) n.sims <- 1000 obj <- summary(fit.5) # save off the summary object sigma.hat <- obj$sigma b.hat <- obj$coef[, 'Estimate', drop=TRUE] cov.beta <- obj$cov.unscaled # extract the covariance matrix k <- obj$df[1] # number of predictors n <- obj$df[1] + obj$df[2] # number of observations set.seed(1) sigma.sim <- sigma.hat * sqrt((n-k) / rchisq(n.sims, n-k)) beta.sim <- matrix(NA_real_, n.sims, k, dimnames=list(NULL, names(beta.hat))) for (i in seq_len(n.sims)) { beta.sim[i, ] <- MASS::mvrnorm(1, b.hat, sigma.sim[i]^2 * cov.beta) }

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n