4. Probabilistic Models
Hypothesize two components of the relationship
Deterministic
Random error
Example: Systolic blood pressure of newborns: p = 6d + ϵ
Random error may be due to other factors (e.g. birth weight)
5. Regression Model
Model relationship between one dependent variable and one or several explanatory variable(s)
bug = α * code size + β * prior bugs + γ * changes + ϵ
Used mainly for prediction and estimation
6. Regression Modeling Steps
1. Hypothesize deterministic component
2. Specify probability distribution of random error term
3. Evaluate fitted model
4. Use model for prediction and estimation
7. Model Specification
Specifying the deterministic component
1. Define the dependent variable and independent variable
2. Hypothesize nature of relationship
Functional form (e.g. linear or non‑linear)
Expected Effects (i.e., signs of coefficients)
Interactions between variables
13. Explanatory and predictive power
R is the measurement of goodness‑of‑fit, i.e., How the model fits to all training data
R = 1 − where Y is the actual dependent variable and is the fitted
R is also called a measurement of explanatory power, i.e. how well the model explains the
data it is trained on
Predictive power indicates how well the model predicts the new data (data not used for
training, also called testing data)
MAE = mean(∣ − Y ∣) where where Y is the actual dependent variable and is the
predicted on testing data
2
2
var(Y )
var( −Y )
Y
^
Y
^
2
Ȳ Ȳ
14. Cross‑validation
Is used to compute predictive power when only a dataset is available:
1. Divide dataset into two subsets: training and testing data
2. Train the model on training data and make prediction for testing data
3. Repeat many times
4. Compute the final mean absolute error