Supervised Learning (Part 2)
Md. Shahidul Islam
Assistant Professor
Dept. of CSE, University of Asia Pacific
Linear Regression
• Linear Regression is a fundamental statistical method for predicting
continuous values.
• It models the relationship between a dependent variable (label) and
one or more independent variables (features) using a straight line.
• A model with two or more dependent variables is called multivariate
linear regression
• A model with two or more independent variables is called multiple
linear regression
2
Linear Regression
Dependent Variable
Independent Variable
Regression Line
3
Linear Regression
Now we have
to predict the
rent of new
house size.
4
Mathematics Behind Linear Regression
• Hypothesis function:
• Cost function: Mean Squared Error (MSE) =
5
Linear Regression
6
• Initialize weight (W), bias (b), and learning rate ().
• Compute the predicted value, .
• Compute the loss using Mean Squared Error (MSE), .
• Update weight and bias using Gradient Descent.
• Repeat until convergence.
Linear Regression Algorithm
7
• Real world applications have multiple independent variables
• Multiple linear regression comes into action
• The formulation is similar to the simple linear regression
• Gradient Descent for Multiple Linear Regression
Multiple Linear Regression
8
• Pros
• Simple model
• Computationally efficient
• Interpretability of the Output
• Cons
• Overly-Simplistic
• Linearity Assumption
• Severely affected by Outliers
Pros and Cons
9
• Describes the relationship between the independent variable x and the
dependent variable y using an nth
-degree polynomial in x
• Characterizes fitting a nonlinear relationship between the x value and the
conditional mean of y
• Types
• Linear – if degree as 1
• Quadratic – if degree as 2
• Cubic – if degree as 3 and goes on, on the basis of degree.
Polynomial Regression
10
Why Polynomial Regression?
11
• Pros
• The best approximation of the connection between the dependent and independent
variables
• Can accommodate a wide range of functions
• Cons
• One or two outliers in the data might have a significant impact
• Fewer model validation methods for detecting outliers in nonlinear regression
Pros and Cons
12
Practice with Data
House Size (sq. ft) Rent ($1000s)
800 1.5
1200 2.0
1500 2.5
1800 3.0
• Predicting house rent based on size.
• Fit a linear regression model and derive the best-fit line.
13
Questions?
14

Lecture 3 Supervised Learning(Machine Learning).pptx

  • 1.
    Supervised Learning (Part2) Md. Shahidul Islam Assistant Professor Dept. of CSE, University of Asia Pacific
  • 2.
    Linear Regression • LinearRegression is a fundamental statistical method for predicting continuous values. • It models the relationship between a dependent variable (label) and one or more independent variables (features) using a straight line. • A model with two or more dependent variables is called multivariate linear regression • A model with two or more independent variables is called multiple linear regression 2
  • 3.
  • 4.
    Linear Regression Now wehave to predict the rent of new house size. 4
  • 5.
    Mathematics Behind LinearRegression • Hypothesis function: • Cost function: Mean Squared Error (MSE) = 5
  • 6.
  • 7.
    • Initialize weight(W), bias (b), and learning rate (). • Compute the predicted value, . • Compute the loss using Mean Squared Error (MSE), . • Update weight and bias using Gradient Descent. • Repeat until convergence. Linear Regression Algorithm 7
  • 8.
    • Real worldapplications have multiple independent variables • Multiple linear regression comes into action • The formulation is similar to the simple linear regression • Gradient Descent for Multiple Linear Regression Multiple Linear Regression 8
  • 9.
    • Pros • Simplemodel • Computationally efficient • Interpretability of the Output • Cons • Overly-Simplistic • Linearity Assumption • Severely affected by Outliers Pros and Cons 9
  • 10.
    • Describes therelationship between the independent variable x and the dependent variable y using an nth -degree polynomial in x • Characterizes fitting a nonlinear relationship between the x value and the conditional mean of y • Types • Linear – if degree as 1 • Quadratic – if degree as 2 • Cubic – if degree as 3 and goes on, on the basis of degree. Polynomial Regression 10
  • 11.
  • 12.
    • Pros • Thebest approximation of the connection between the dependent and independent variables • Can accommodate a wide range of functions • Cons • One or two outliers in the data might have a significant impact • Fewer model validation methods for detecting outliers in nonlinear regression Pros and Cons 12
  • 13.
    Practice with Data HouseSize (sq. ft) Rent ($1000s) 800 1.5 1200 2.0 1500 2.5 1800 3.0 • Predicting house rent based on size. • Fit a linear regression model and derive the best-fit line. 13
  • 14.

Editor's Notes

  • #2 Multivariate Linear Regression → Multiple dependent variables (y1,y2,...y_1, y_2, ...y1​,y2​,...) Multiple Linear Regression → Multiple independent variables (x1,x2,...x_1, x_2, ...x1​,x2​,...)
  • #3 import numpy as np import matplotlib.pyplot as plt # Sample data: House size (sq ft) and Rent (in $1000s) data = { "House Size": [800, 1000, 1200, 1500, 1800, 2000, 2200], "Rent": [1.5, 2.0, 2.4, 3.0, 3.5, 3.8, 4.2] } # Fit a simple linear regression model house_size = np.array(data["House Size"]).reshape(-1, 1) rent = np.array(data["Rent"]) # Perform linear regression from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(house_size, rent) predicted_rent = model.predict(house_size) # Scatter plot plt.scatter(data["House Size"], data["Rent"], color='blue', label="Actual Data", edgecolors='k', s=100) plt.plot(data["House Size"], predicted_rent, color='red', label="Regression Line") # Labels and title plt.xlabel("House Size (sq ft)") plt.ylabel("Rent ($1000s)") plt.title("House Size vs Rent Prediction") plt.legend() # Display plot plt.show()
  • #4 import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression # Sample data: House size (sq ft) and Rent (in $1000s) data = { "House Size": [800, 1000, 1200, 1500, 1800, 2000, 2200], "Rent": [1.5, 2.0, 2.4, 3.0, 3.5, 3.8, 4.2] } # Convert data into numpy arrays house_size = np.array(data["House Size"]).reshape(-1, 1) rent = np.array(data["Rent"]) # Initialize and train the Linear Regression model model = LinearRegression() model.fit(house_size, rent) # Predict rent for a new house size (query point) query_house_size = np.array([[1600]]) # House size: 1600 sq ft predicted_rent = model.predict(query_house_size)[0] # Scatter plot of actual data plt.scatter(data["House Size"], data["Rent"], color='blue', label="Actual Data", edgecolors='k', s=100) # Regression line predicted_rent_line = model.predict(house_size) plt.plot(data["House Size"], predicted_rent_line, color='red', label="Regression Line") # Mark the query point plt.scatter(query_house_size, predicted_rent, color='black', marker='x', s=200, label=f'Query Point (Predicted: ${predicted_rent*1000:.2f})') # Labels and title plt.xlabel("House Size (sq ft)") plt.ylabel("Rent ($1000s)") plt.title("House Size vs Rent Prediction with Query Point") plt.legend() # Display plot plt.show()
  • #6 import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression # Dataset: House size vs Rent with some variations data = { "House Size": [800, 1000, 1200, 1500, 1800, 2000, 2200], "Rent": [1.2, 2.5, 2.0, 3.5, 3.2, 4.5, 4.0] # Slightly scattered values } # Convert data into numpy arrays house_size = np.array(data["House Size"]).reshape(-1, 1) rent = np.array(data["Rent"]) # Create 3 different regression models model1 = LinearRegression() model2 = LinearRegression() model3 = LinearRegression() # Different training data for variety house_size1, rent1 = house_size[:-1], rent[:-1] # Exclude last point house_size2, rent2 = house_size[1:], rent[1:] # Exclude first point house_size3, rent3 = house_size, rent # Full dataset # Fit models model1.fit(house_size1, rent1) model2.fit(house_size2, rent2) model3.fit(house_size3, rent3) # Predictions pred_rent1 = model1.predict(house_size) pred_rent2 = model2.predict(house_size) pred_rent3 = model3.predict(house_size) # Compute MSE mse1 = np.mean((rent - pred_rent1) ** 2) mse2 = np.mean((rent - pred_rent2) ** 2) mse3 = np.mean((rent - pred_rent3) ** 2) # Scatter plot for actual data plt.scatter(data["House Size"], data["Rent"], color='black', label="Actual Data", edgecolors='k', s=100) # Regression lines (all dotted) plt.plot(data["House Size"], pred_rent1, color='red', linestyle="dotted", label=f"Regression 1 (MSE: {mse1:.4f})") plt.plot(data["House Size"], pred_rent2, color='blue', linestyle="dotted", label=f"Regression 2 (MSE: {mse2:.4f})") plt.plot(data["House Size"], pred_rent3, color='green', linestyle="dotted", label=f"Regression 3 (MSE: {mse3:.4f})") # Draw vertical error lines to the data points for i in range(len(data["House Size"])): plt.plot([data["House Size"][i], data["House Size"][i]], [rent[i], pred_rent1[i]], 'r--', alpha=0.5) # Regression 1 error lines plt.plot([data["House Size"][i], data["House Size"][i]], [rent[i], pred_rent2[i]], 'b--', alpha=0.5) # Regression 2 error lines plt.plot([data["House Size"][i], data["House Size"][i]], [rent[i], pred_rent3[i]], 'g--', alpha=0.5) # Regression 3 error lines # Choose the best model based on MSE best_model = min((mse1, "Regression 1"), (mse2, "Regression 2"), (mse3, "Regression 3")) plt.title(f"House Size vs Rent Prediction (Best: {best_model[1]})") # Labels plt.xlabel("House Size (sq ft)") plt.ylabel("Rent ($1000s)") plt.legend() # Display plot plt.show()
  • #11 import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # 1. Generate Sample Data (Shared Data) np.random.seed(0) n_samples = 12 x = np.linspace(-5, 5, n_samples).reshape(-1, 1) # Shared x values # Generate y values for linear relationship (with noise) y_linear = 2 * x + 1 + np.random.normal(0, 8, n_samples).reshape(-1, 1) # Generate y values for polynomial relationship (with noise) y_poly = 0.5 * x**2 + 2 * x - 1 + np.random.normal(0, 8, n_samples).reshape(-1, 1) # 2. Linear Regression (on the shared x, y_linear) linear_reg = LinearRegression() linear_reg.fit(x, y_linear) # Fit to the linearly generated y y_linear_pred = linear_reg.predict(x) # 3. Polynomial Regression (on the shared x, y_poly) poly_features = PolynomialFeatures(degree=2, include_bias=False) x_transformed = poly_features.fit_transform(x) # Transform x for polynomial regression poly_reg = LinearRegression() poly_reg.fit(x_transformed, y_poly) # Fit to the polynomially generated y y_poly_pred = poly_reg.predict(x_transformed) # 4. Plotting (Two Subplots) plt.figure(figsize=(12, 5)) # Linear Regression Plot plt.subplot(1, 2, 1) plt.scatter(x, y_linear, label="Linear Data") # Scatter plot of the LINEAR data plt.plot(x, y_linear_pred, color='red', label="Linear Regression") plt.xlabel("X") plt.ylabel("Y") plt.title("Linear Regression") plt.legend() # Polynomial Regression Plot plt.subplot(1, 2, 2) plt.scatter(x, y_poly, label="Polynomial Data") # Scatter plot of the POLYNOMIAL data plt.plot(x, y_poly_pred, color='red', label="Polynomial Regression (Degree 2)") plt.xlabel("X") plt.ylabel("Y") plt.title("Polynomial Regression") plt.legend() plt.tight_layout() plt.show() # Print coefficients (optional) print("Linear Regression Coefficients:", linear_reg.coef_, linear_reg.intercept_) print("Polynomial Regression Coefficients:", poly_reg.coef_, poly_reg.intercept_)