INT255: Mathematics behind Machine
Learning
Unit 3: Dimensionality Reduction and Regression
Techniques
Contents to be covered
• Principal Component Analysis
• Linear Discriminant Analysis
• Least Squared Approximation
• Minimum Normed Solution
• Regression Analysis: Linear, Multiple, Logistic
Principal Component Analysis (PCA)
• Aims to reduce dimensionality.
• Capture directions that explain maximum variance in data.
• Select top k eigen vectors corresponding to largest eigen values.
• Project data onto the selected eigen vectors (principal components).
• PCA is useful to visualize high dimensional data.
Linear Discriminant Analysis (LDA)
• Reduce dimensionality
• Maximizing the separability of classes.
• Calculate the within-class scatter matrix ()
• Calculate the between-class scatter matrix ()
• Compute eigen values and eigen vectors of
• Select top k eigen vectors.
• Project data onto the selected eigen vectors.
Linear Discriminant Analysis (LDA)
• LDA is method used for classification and dimensionality
reduction.
• It finds a linear combination of features
• Best separates two or more classes of data.
• LDA finds a line (or plane) that best separates the classes.
• It uses means and scatter matrices
• to compute the optimal projection.
• The projected data is used for classification.
Least Squared Approximation
• Best-fit solution
• Overdetermined system of equations (more equations than
unknowns)
• Minimizes sum of squared differences (i.e., errors)
• Typically used in Regression Analysis
Minimum Normed Solution
• Underdetermined systems of linear equations
• More unknowns than equations
• Infinite solutions possible
• Solution that has smallest “norm” (i.e., magnitude)
Regression Analysis
• Relationship between a dependent variable (target) and
one or more independent variables (predictors)
• Linear Regression:
Relationship between a dependent and one independent variable
Straight line model ()
e.g., house_price = m(size_sq_ft) + c
Regression Analysis
• Multiple Linear Regression
Relationship between the dependent and two or more
independent variables
e.g.,
Regression Analysis
Regression Analysis
Regression Analysis
• Logistic Regression
dependent variable is categorical
e.g., binary classification such as "yes" or "no“
Model: ; p is the probability of outcome
Logistic Regression
• Logistic Regression is a supervised machine learning algorithm
• used for binary classification tasks,
• the goal is to predict the probability of an input belonging to one of two classes
(e.g., 0 or 1).
• Unlike Linear Regression, which predicts continuous values,
• Logistic Regression predicts probabilities
• Classifies data points based on a threshold (e.g., 0.5).
Logistic Regression
• Logistic Regression is a supervised machine learning algorithm
• used for binary classification tasks,
• the goal is to predict the probability of an input belonging to one of two classes
(e.g., 0 or 1).
• Unlike Linear Regression, which predicts continuous values,
• Logistic Regression predicts probabilities
• Classifies data points based on a threshold (e.g., 0.5).
Logistic Regression
Logistic Regression
Logistic Regression
Logistic Regression
Logistic Regression
Logistic Regression
Logistic Regression
Questions please

Mathematics_behind_machine_learning_INT255.pptx

  • 1.
    INT255: Mathematics behindMachine Learning Unit 3: Dimensionality Reduction and Regression Techniques
  • 2.
    Contents to becovered • Principal Component Analysis • Linear Discriminant Analysis • Least Squared Approximation • Minimum Normed Solution • Regression Analysis: Linear, Multiple, Logistic
  • 3.
    Principal Component Analysis(PCA) • Aims to reduce dimensionality. • Capture directions that explain maximum variance in data. • Select top k eigen vectors corresponding to largest eigen values. • Project data onto the selected eigen vectors (principal components). • PCA is useful to visualize high dimensional data.
  • 4.
    Linear Discriminant Analysis(LDA) • Reduce dimensionality • Maximizing the separability of classes. • Calculate the within-class scatter matrix () • Calculate the between-class scatter matrix () • Compute eigen values and eigen vectors of • Select top k eigen vectors. • Project data onto the selected eigen vectors.
  • 5.
    Linear Discriminant Analysis(LDA) • LDA is method used for classification and dimensionality reduction. • It finds a linear combination of features • Best separates two or more classes of data. • LDA finds a line (or plane) that best separates the classes. • It uses means and scatter matrices • to compute the optimal projection. • The projected data is used for classification.
  • 6.
    Least Squared Approximation •Best-fit solution • Overdetermined system of equations (more equations than unknowns) • Minimizes sum of squared differences (i.e., errors) • Typically used in Regression Analysis
  • 7.
    Minimum Normed Solution •Underdetermined systems of linear equations • More unknowns than equations • Infinite solutions possible • Solution that has smallest “norm” (i.e., magnitude)
  • 8.
    Regression Analysis • Relationshipbetween a dependent variable (target) and one or more independent variables (predictors) • Linear Regression: Relationship between a dependent and one independent variable Straight line model () e.g., house_price = m(size_sq_ft) + c
  • 9.
    Regression Analysis • MultipleLinear Regression Relationship between the dependent and two or more independent variables e.g.,
  • 10.
  • 11.
  • 12.
    Regression Analysis • LogisticRegression dependent variable is categorical e.g., binary classification such as "yes" or "no“ Model: ; p is the probability of outcome
  • 13.
    Logistic Regression • LogisticRegression is a supervised machine learning algorithm • used for binary classification tasks, • the goal is to predict the probability of an input belonging to one of two classes (e.g., 0 or 1). • Unlike Linear Regression, which predicts continuous values, • Logistic Regression predicts probabilities • Classifies data points based on a threshold (e.g., 0.5).
  • 14.
    Logistic Regression • LogisticRegression is a supervised machine learning algorithm • used for binary classification tasks, • the goal is to predict the probability of an input belonging to one of two classes (e.g., 0 or 1). • Unlike Linear Regression, which predicts continuous values, • Logistic Regression predicts probabilities • Classifies data points based on a threshold (e.g., 0.5).
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.