Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
The document discusses various topics related to evolutionary computation and artificial intelligence, including:
- Evolutionary computation concepts like genetic algorithms, genetic programming, evolutionary programming, and swarm intelligence approaches like ant colony optimization and particle swarm optimization.
- The use of intelligent agents in artificial intelligence and differences between single and multi-agent systems.
- Soft computing techniques involving fuzzy logic, machine learning, probabilistic reasoning and other approaches.
- Specific concepts discussed in more depth include genetic algorithms, genetic programming, swarm intelligence, ant colony optimization, and metaheuristics.
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
The document discusses various topics in natural language processing and knowledge representation techniques, including conceptual dependency theory, script structures, the CYC theory, case grammars, and the semantic web. It provides information on each topic through a series of slides by Madhav Mishra, describing things like the components of scripts, features and examples of CYC knowledge base, how semantic web uses XML, RDF and ontologies, and an overview of case grammars and their use of functional relationships between nouns and verbs.
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
This document covers probability theory and fuzzy sets and fuzzy logic, which are topics for an applied artificial intelligence unit. It discusses key concepts for probability theory including joint probability, conditional probability, and Bayes' theorem. It also covers fuzzy sets and fuzzy logic, including fuzzy set operations, types of membership functions, linguistic variables, and fuzzy propositions and inference rules. Examples are provided throughout to illustrate probability and fuzzy set concepts. The document is presented as a slideshow with explanatory text and diagrams on each slide.
The document discusses three classes of decision problems:
1) P problems that can be solved quickly in polynomial time.
2) NP problems where a "YES" answer has a proof checkable in polynomial time.
3) co-NP problems where a "NO" answer has a proof checkable in polynomial time.
It then defines NP-Complete problems as the hardest problems in NP, and explains that 3SAT is a famous NP-Complete problem involving finding a variable assignment that satisfies a Boolean formula of clauses with 3 variables each. The document provides methods for proving other problems like Clique and Independent Set are also NP-Complete by reducing 3SAT to them in polynomial time.
The document discusses different meta-learning techniques for few-shot learning, including data augmentation, embedding, optimization, and semantic-based approaches. It provides examples of methods under each category and evaluates their performance on Omniglot and MiniImageNet datasets. While data augmentation and embedding techniques performed well on Omniglot, their accuracy was lower on MiniImageNet. Overall performance of state-of-the-art models remains far below human abilities, indicating room for improvement through hybrid models combining multiple technique
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
The document discusses various topics related to evolutionary computation and artificial intelligence, including:
- Evolutionary computation concepts like genetic algorithms, genetic programming, evolutionary programming, and swarm intelligence approaches like ant colony optimization and particle swarm optimization.
- The use of intelligent agents in artificial intelligence and differences between single and multi-agent systems.
- Soft computing techniques involving fuzzy logic, machine learning, probabilistic reasoning and other approaches.
- Specific concepts discussed in more depth include genetic algorithms, genetic programming, swarm intelligence, ant colony optimization, and metaheuristics.
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
The document discusses various topics in natural language processing and knowledge representation techniques, including conceptual dependency theory, script structures, the CYC theory, case grammars, and the semantic web. It provides information on each topic through a series of slides by Madhav Mishra, describing things like the components of scripts, features and examples of CYC knowledge base, how semantic web uses XML, RDF and ontologies, and an overview of case grammars and their use of functional relationships between nouns and verbs.
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
This document covers probability theory and fuzzy sets and fuzzy logic, which are topics for an applied artificial intelligence unit. It discusses key concepts for probability theory including joint probability, conditional probability, and Bayes' theorem. It also covers fuzzy sets and fuzzy logic, including fuzzy set operations, types of membership functions, linguistic variables, and fuzzy propositions and inference rules. Examples are provided throughout to illustrate probability and fuzzy set concepts. The document is presented as a slideshow with explanatory text and diagrams on each slide.
The document discusses three classes of decision problems:
1) P problems that can be solved quickly in polynomial time.
2) NP problems where a "YES" answer has a proof checkable in polynomial time.
3) co-NP problems where a "NO" answer has a proof checkable in polynomial time.
It then defines NP-Complete problems as the hardest problems in NP, and explains that 3SAT is a famous NP-Complete problem involving finding a variable assignment that satisfies a Boolean formula of clauses with 3 variables each. The document provides methods for proving other problems like Clique and Independent Set are also NP-Complete by reducing 3SAT to them in polynomial time.
The document discusses different meta-learning techniques for few-shot learning, including data augmentation, embedding, optimization, and semantic-based approaches. It provides examples of methods under each category and evaluates their performance on Omniglot and MiniImageNet datasets. While data augmentation and embedding techniques performed well on Omniglot, their accuracy was lower on MiniImageNet. Overall performance of state-of-the-art models remains far below human abilities, indicating room for improvement through hybrid models combining multiple technique
Meta-learning, or learning how to learn, is our innate ability to learn new, ever more complex tasks very efficiently by building on prior experience. It is a very exciting direction for machine learning (and AI in general). In this tutorial, I introduce the main concepts and state of the art.
This document provides an overview of linear regression techniques including:
- Single dimension linear regression which finds the best fitting line to predict a target variable y based on a single input variable x.
- Multi-dimension linear regression which extends this to multiple input variables by finding the best fitting hyperplane. Gradient descent can be used to minimize error.
- Polynomial regression can be performed by including powers of input variables.
- One-hot encoding represents categorical variables as binary variables to work with linear models.
Analysis of data is an important task in data managements systems. Many mathematical tools are used in data analysis. A new division of data management has appeared in machine learning, linear algebra, an optimal tool to analyse and manipulate the data. Data science is a multi-disciplinary subject that uses scientific methods to process the structured and unstructured data to extract the knowledge by applying suitable algorithms and systems. The strength of linear algebra is ignored by the researchers due to the poor understanding. It powers major areas of Data Science including the hot fields of Natural Language Processing and Computer Vision. The data science enthusiasts finding the programming languages for data science are easy to analyze the big data rather than using mathematical tools like linear algebra. Linear algebra is a must-know subject in data science. It will open up possibilities of working and manipulating data. In this paper, some applications of Linear Algebra in Data Science are explained.
Transformer 이전(2015) Attention에 대해 Alignment를 이용한 Attention Mechnism을 Neural Machine Translation에 적용하여 Long Input Sequence에 대해 성능 개선을 보여줌
Attention Mechanism에 대해 Global Attention과 Local Attention 2가지 방법을 제시
Fuzzy inference systems use fuzzy logic to map inputs to outputs. There are two main types:
Mamdani systems use fuzzy outputs and are well-suited for problems involving human expert knowledge. Sugeno systems have faster computation using linear or constant outputs.
The fuzzy inference process involves fuzzifying inputs, applying fuzzy logic operators, and using if-then rules. Outputs are determined through implication, aggregation, and defuzzification. Mamdani systems find the centroid of fuzzy outputs while Sugeno uses weighted averages, making it more efficient.
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
This is the documentation of the study-meeting in lab.
Tha book title is "Hands-On Machine Learning with Scikit-Learn and TensorFlow" and this is the chapter 8.
Binary Class and Multi Class Strategies for Machine LearningPaxcel Technologies
This presentation discusses the following -
Possible strategies to follow when working on a new machine learning problem.
The common problems with classifiers (how to detect them and eliminate them).
Popular approaches on how to use binary classifiers to problems with multi class classification.
This document discusses support vector machines (SVMs) for classification. It explains that SVMs find the optimal separating hyperplane that maximizes the margin between positive and negative examples. This is formulated as a convex optimization problem. Both primal and dual formulations are presented, with the dual having fewer variables that scale with the number of examples rather than dimensions. Methods for handling non-separable data using soft margins and kernels for nonlinear classification are also summarized. Popular kernel functions like polynomial and Gaussian kernels are mentioned.
1. Machine learning is a set of techniques that use data to build models that can make predictions without being explicitly programmed.
2. There are two main types of machine learning: supervised learning, where the model is trained on labeled examples, and unsupervised learning, where the model finds patterns in unlabeled data.
3. Common machine learning algorithms include linear regression, logistic regression, decision trees, support vector machines, naive Bayes, k-nearest neighbors, k-means clustering, and random forests. These can be used for regression, classification, clustering, and dimensionality reduction.
This document provides an overview of natural language processing (NLP). It discusses topics like natural language understanding, text categorization, syntactic analysis including parsing and part-of-speech tagging, semantic analysis, and pragmatic analysis. It also covers corpus-based statistical approaches to NLP, measuring performance, and supervised learning methods. The document outlines challenges in NLP like ambiguity and knowledge representation.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session.
Lecture1 introduction to machine learningUmmeSalmaM1
Machine Learning is a field of computer science which deals with the study of computer algorithms that improve automatically through experience. In this PPT we discuss the following concepts - Prerequisite, Definition, Introduction to Machine Learning (ML), Fields associated with ML, Need for ML, Difference between Artificial Intelligence, Machine Learning, Deep Learning, Types of learning in ML, Applications of ML, Limitations of Machine Learning.
Ensemble methods combine multiple machine learning models to obtain better predictive performance than could be obtained from any of the constituent models alone. The document discusses major families of ensemble methods including bagging, boosting, and voting. It provides examples like random forest, AdaBoost, gradient tree boosting, and XGBoost which build ensembles of decision trees. Ensemble methods help reduce variance and prevent overfitting compared to single models.
Methods of Optimization in Machine LearningKnoldus Inc.
In this session we will discuss about various methods to optimise a machine learning model and, how we can adjust the hyper-parameters to minimise the cost function.
Slide for Arithmer Seminar given by Dr. Daisuke Sato (Arithmer) at Arithmer inc.
The topic is on "explainable AI".
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
This document provides an overview of regularized regression techniques including ridge regression and lasso regression. It discusses when to use regularization to prevent overfitting, the tradeoff between bias and variance, and different types of regularization. Ridge regression minimizes the sum of squared coefficients while lasso regression minimizes the sum of absolute values of coefficients, allowing it to perform variable selection. Cross-validation is described as a method for selecting the optimal regularization parameter lambda. Advantages of regularization include improved generalization and interpretability. The document also provides an example using different regression models to predict diamond prices based on other variables in a dataset.
Linear regression is a supervised machine learning technique used to model the relationship between a continuous dependent variable and one or more independent variables. It is commonly used for prediction and forecasting. The regression line represents the best fit line for the data using the least squares method to minimize the distance between the observed data points and the regression line. R-squared measures how well the regression line represents the data, on a scale of 0-100%. Linear regression performs well when data is linearly separable but has limitations such as assuming linear relationships and being sensitive to outliers and multicollinearity.
Meta-learning, or learning how to learn, is our innate ability to learn new, ever more complex tasks very efficiently by building on prior experience. It is a very exciting direction for machine learning (and AI in general). In this tutorial, I introduce the main concepts and state of the art.
This document provides an overview of linear regression techniques including:
- Single dimension linear regression which finds the best fitting line to predict a target variable y based on a single input variable x.
- Multi-dimension linear regression which extends this to multiple input variables by finding the best fitting hyperplane. Gradient descent can be used to minimize error.
- Polynomial regression can be performed by including powers of input variables.
- One-hot encoding represents categorical variables as binary variables to work with linear models.
Analysis of data is an important task in data managements systems. Many mathematical tools are used in data analysis. A new division of data management has appeared in machine learning, linear algebra, an optimal tool to analyse and manipulate the data. Data science is a multi-disciplinary subject that uses scientific methods to process the structured and unstructured data to extract the knowledge by applying suitable algorithms and systems. The strength of linear algebra is ignored by the researchers due to the poor understanding. It powers major areas of Data Science including the hot fields of Natural Language Processing and Computer Vision. The data science enthusiasts finding the programming languages for data science are easy to analyze the big data rather than using mathematical tools like linear algebra. Linear algebra is a must-know subject in data science. It will open up possibilities of working and manipulating data. In this paper, some applications of Linear Algebra in Data Science are explained.
Transformer 이전(2015) Attention에 대해 Alignment를 이용한 Attention Mechnism을 Neural Machine Translation에 적용하여 Long Input Sequence에 대해 성능 개선을 보여줌
Attention Mechanism에 대해 Global Attention과 Local Attention 2가지 방법을 제시
Fuzzy inference systems use fuzzy logic to map inputs to outputs. There are two main types:
Mamdani systems use fuzzy outputs and are well-suited for problems involving human expert knowledge. Sugeno systems have faster computation using linear or constant outputs.
The fuzzy inference process involves fuzzifying inputs, applying fuzzy logic operators, and using if-then rules. Outputs are determined through implication, aggregation, and defuzzification. Mamdani systems find the centroid of fuzzy outputs while Sugeno uses weighted averages, making it more efficient.
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
This is the documentation of the study-meeting in lab.
Tha book title is "Hands-On Machine Learning with Scikit-Learn and TensorFlow" and this is the chapter 8.
Binary Class and Multi Class Strategies for Machine LearningPaxcel Technologies
This presentation discusses the following -
Possible strategies to follow when working on a new machine learning problem.
The common problems with classifiers (how to detect them and eliminate them).
Popular approaches on how to use binary classifiers to problems with multi class classification.
This document discusses support vector machines (SVMs) for classification. It explains that SVMs find the optimal separating hyperplane that maximizes the margin between positive and negative examples. This is formulated as a convex optimization problem. Both primal and dual formulations are presented, with the dual having fewer variables that scale with the number of examples rather than dimensions. Methods for handling non-separable data using soft margins and kernels for nonlinear classification are also summarized. Popular kernel functions like polynomial and Gaussian kernels are mentioned.
1. Machine learning is a set of techniques that use data to build models that can make predictions without being explicitly programmed.
2. There are two main types of machine learning: supervised learning, where the model is trained on labeled examples, and unsupervised learning, where the model finds patterns in unlabeled data.
3. Common machine learning algorithms include linear regression, logistic regression, decision trees, support vector machines, naive Bayes, k-nearest neighbors, k-means clustering, and random forests. These can be used for regression, classification, clustering, and dimensionality reduction.
This document provides an overview of natural language processing (NLP). It discusses topics like natural language understanding, text categorization, syntactic analysis including parsing and part-of-speech tagging, semantic analysis, and pragmatic analysis. It also covers corpus-based statistical approaches to NLP, measuring performance, and supervised learning methods. The document outlines challenges in NLP like ambiguity and knowledge representation.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session.
Lecture1 introduction to machine learningUmmeSalmaM1
Machine Learning is a field of computer science which deals with the study of computer algorithms that improve automatically through experience. In this PPT we discuss the following concepts - Prerequisite, Definition, Introduction to Machine Learning (ML), Fields associated with ML, Need for ML, Difference between Artificial Intelligence, Machine Learning, Deep Learning, Types of learning in ML, Applications of ML, Limitations of Machine Learning.
Ensemble methods combine multiple machine learning models to obtain better predictive performance than could be obtained from any of the constituent models alone. The document discusses major families of ensemble methods including bagging, boosting, and voting. It provides examples like random forest, AdaBoost, gradient tree boosting, and XGBoost which build ensembles of decision trees. Ensemble methods help reduce variance and prevent overfitting compared to single models.
Methods of Optimization in Machine LearningKnoldus Inc.
In this session we will discuss about various methods to optimise a machine learning model and, how we can adjust the hyper-parameters to minimise the cost function.
Slide for Arithmer Seminar given by Dr. Daisuke Sato (Arithmer) at Arithmer inc.
The topic is on "explainable AI".
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
This document provides an overview of regularized regression techniques including ridge regression and lasso regression. It discusses when to use regularization to prevent overfitting, the tradeoff between bias and variance, and different types of regularization. Ridge regression minimizes the sum of squared coefficients while lasso regression minimizes the sum of absolute values of coefficients, allowing it to perform variable selection. Cross-validation is described as a method for selecting the optimal regularization parameter lambda. Advantages of regularization include improved generalization and interpretability. The document also provides an example using different regression models to predict diamond prices based on other variables in a dataset.
Linear regression is a supervised machine learning technique used to model the relationship between a continuous dependent variable and one or more independent variables. It is commonly used for prediction and forecasting. The regression line represents the best fit line for the data using the least squares method to minimize the distance between the observed data points and the regression line. R-squared measures how well the regression line represents the data, on a scale of 0-100%. Linear regression performs well when data is linearly separable but has limitations such as assuming linear relationships and being sensitive to outliers and multicollinearity.
This document discusses different types of regression analysis techniques including linear regression, polynomial regression, support vector regression, decision tree regression, ridge regression, lasso regression, and logistic regression. Linear regression finds the relationship between a continuous dependent variable and one or more independent variables. Polynomial regression handles nonlinear relationships through higher-order terms. Support vector regression and decision tree regression can handle both linear and nonlinear data. Ridge and lasso regression are regularization techniques used to prevent overfitting. Logistic regression is for classification rather than regression problems.
Linear regression is a supervised machine learning technique used to model the relationship between a continuous dependent variable and one or more independent variables. It finds the line of best fit that minimizes the distance between the observed data points and the regression line. The slope of the regression line is determined using the least squares method. R-squared measures how well the regression line represents the data, with values closer to 1 indicating a stronger relationship. The standard error of the estimate quantifies the accuracy of predictions made by the linear regression model. Linear regression performs well when data is linearly separable, but has limitations such as an assumption of linear relationships and sensitivity to outliers and multicollinearity.
The document discusses different types of linear regression models including simple linear regression, multiple linear regression, ridge regression, lasso regression, and elastic net regression. It explains the concepts of slope, intercept, underfitting, overfitting, and regularization techniques used to constrain model weights. Specifically, it describes how ridge regression uses an L2 penalty, lasso regression uses an L1 penalty, and elastic net uses a combination of L1 and L2 penalties to regularize linear regression models and reduce overfitting.
Introduction to linear regression and the maths behind it like line of best fit, regression matrics. Other concepts include cost function, gradient descent, overfitting and underfitting, r squared.
Regression analysis is a predictive modeling technique used to investigate relationships between variables. It allows one to estimate the effects of independent variables on a dependent variable. Regression analysis can be used for forecasting, time series modeling, and determining causal relationships. There are different types of regression depending on the number of variables and the shape of the regression line. Linear regression models the linear relationship between two variables using an equation with parameters estimated to minimize error. Correlation and covariance measures the strength and direction of association between variables. Analysis of variance (ANOVA) compares the means of groups within data. Heteroskedasticity refers to unequal variability of a dependent variable across the range of independent variable values.
In this chapter, our goal is to introduce the foundational principles of supervised learning. As we progress, we place particular emphasis on both regression and classification techniques, offering learners a more comprehensive perspective on the practical application of these methodologies in real-world scenarios. By the end of this chapter, learners will not only possess a robust understanding of the core principles but will also be armed with valuable insights into the tangible applications of supervised learning. This knowledge empowers them to skillfully navigate and leverage the full potential of this influential paradigm within the vast expanse of machine learning.
This document provides an overview of ridge and lasso regression techniques for regularization. It begins by introducing regression analysis and issues like overfitting and multicollinearity. It then defines regularization as a way to prevent overfitting by adding bias. Ridge regression uses an L2 penalty term while lasso uses L1, and lasso can perform feature selection by setting coefficients to zero. Cross-validation is described as a method for choosing the optimal regularization tuning parameter. Python tools for implementing ridge and lasso regression with cross-validation are also mentioned.
Regression analysis models the relationship between a dependent (target) variable and one or more independent (predictor) variables. Linear regression predicts continuous variables using a linear equation. Simple linear regression uses one independent variable, while multiple linear regression uses more than one. The goal is to find the "best fit" line that minimizes error between predicted and actual values. Feature selection identifies important predictors by removing irrelevant or redundant features. Techniques include wrapper, filter, and embedded methods. Overfitting and underfitting occur when models are too complex or simple, respectively. Dimensionality reduction through techniques like principal component analysis (PCA) transform correlated variables into linearly uncorrelated components.
This document discusses supervised learning. Supervised learning uses labeled training data to train models to predict outputs for new data. Examples given include weather prediction apps, spam filters, and Netflix recommendations. Supervised learning algorithms are selected based on whether the target variable is categorical or continuous. Classification algorithms are used when the target is categorical while regression is used for continuous targets. Common regression algorithms discussed include linear regression, logistic regression, ridge regression, lasso regression, and elastic net. Metrics for evaluating supervised learning models include accuracy, R-squared, adjusted R-squared, mean squared error, and coefficients/p-values. The document also covers challenges like overfitting and regularization techniques to address it.
linear regression is a linear approach for modelling a predictive relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables), which are measured without error. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.
In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.
Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications.[4] This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.
Linear regression has many practical uses. Most applications fall into one of the following two broad categories:
If the goal is error reduction in prediction or forecasting, linear regression can be used to fit a predictive model to an observed data set of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response.
If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which subsets of explanatory variables may contain redundant information about the response.
Linear regression is a statistical technique used to model the relationship between a dependent variable and one or more independent variables by fitting a straight line to the data points. It allows predictions and understanding of factor significance. The dependent variable is what is being predicted, while independent variables may help explain its variability. Coefficients represent changes in the dependent variable per independent variable change, and the intercept is the dependent variable value when independents are zero. Linear regression relies on assumptions like linearity and normality that may be violated for complex relationships. Methods like ordinary least squares and gradient descent fit regression models by minimizing residuals or cost functions. Evaluation metrics and significance tests help interpret results and variable importance.
These slides will help you to crack interviews for product-based companies if you are planning your career in Data Science, Artificial Intelligence, etc.
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
This lecture provides an overview of linear regression analysis, interaction terms, ANOVA, optimization, log-level, and log-log transformations. The first practical example centers around the Boston housing market where the second example dives into business applications of regression analysis in a supermarket retailer.
This document provides a guide for building generalized linear models (GLMs) in R to accurately model insurance claim frequency and severity. It outlines steps for data preparation, including handling missing data, transforming variables, and removing outliers. It then discusses modeling count/frequency data with Poisson or negative binomial models including an offset for exposure. Severity is typically modeled with a gamma or normal distribution. The document provides examples of investigating interactions and comparing models using AIC, BIC, and residual analysis.
This document discusses multiple linear regression analysis performed using SAS. It begins by outlining the assumptions of linear regression, including a linear relationship between variables, normality, no multicollinearity, and homoscedasticity. It then explains that multiple linear regression attempts to model the relationship between multiple explanatory variables and a response variable by fitting a linear equation to observed data. The document goes on to describe the regression analysis process, model selection, interpretation of outputs like R-squared and p-values, and evaluation of diagnostics like autocorrelation. It concludes by listing the predictor variables selected by the stepwise regression model and interpreting their parameter estimates.
This document provides an overview of forecasting using Eviews 2.0 software. It distinguishes between ex post and ex ante forecasting. Ex post forecasts use known data to evaluate a forecasting model, while ex ante forecasts predict values using uncertain explanatory variables. The document then discusses univariate forecasting methods in Eviews, including trend extrapolation, modeling trend behavior, and analyzing residuals to check assumptions. It provides examples of estimating a trend model, viewing residuals, and making forecasts in Eviews.
Interpretability in ML & Sparse Linear RegressionUnchitta Kan
The presentation, first given on January 8, 2019, introduces the concept of interpretability in machine learning, and why we might care about it. It also introduces an example of an interpretable, sparse model which is lasso regression.
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
The document discusses various regression techniques including ridge regression, lasso regression, and elastic net regression. It begins with an overview of advancements in regression analysis since the late 1800s/early 1900s enabled by increased computing power. Modern high-dimensional data often has many independent variables, requiring improved regression methods. The document then provides technical explanations and formulas for ordinary least squares regression, ridge regression, lasso regression, and their properties such as bias-variance tradeoffs. It explains how ridge and lasso regression address limitations of OLS through regularization that shrinks coefficients.
Similar to Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University (20)
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
2. TOPICS TO
BE
COVERED…
Linear Models: Least
Squares method,
Multivariate Linear
Regression, Regularized
Regression, Bias/Variance
Trade-off, Dimension
Reduction
Logistic Regression,
Gradient Descent
Perceptron, Support Vector
Machines, Soft Margin SVM,
Time Series Analysis,
Forecasting
PPT BY: MADHAV MISHRA 2
3. What Is the Least Squares Method?
The "least squares" method is a form of mathematical regression analysis used to
determine the line of best fit for a set of data, providing a visual demonstration of
the relationship between the data points. Each point of data represents the
relationship between a known independent variable and an unknown dependent
variable.
What Does the Least Squares Method Tell You?
The least squares method provides the overall rationale for the placement of the
line of best fit among the data points being studied. The most common application
of this method, which is sometimes referred to as "linear" or "ordinary", aims to
create a straight line that minimizes the sum of the squares of the errors that are
generated by the results of the associated equations, such as the squared residuals
resulting from differences in the observed value, and the value anticipated, based
on that model.
This method of regression analysis begins with a set of data points to be plotted on
an x- and y-axis graph.
PPT BY: MADHAV MISHRA 3
4. An analyst using the least squares method
will generate a line of best fit that explains
the potential relationship between
independent and dependent variables.
In regression analysis, dependent variables
are illustrated on the vertical y-axis, while
independent variables are illustrated on the
horizontal x-axis. These designations will form
the equation for the line of best fit, which is
determined from the least squares method.
When we fit a regression line to set of points,
we assume that there is some unknown linear
relationship between Y and X, and that for
every one-unit increase in X, Y increases by
some set amount on average.
Our fitted regression line enables us to predict
the response, Y, for a given value of X.
But for any specific observation, the actual
value of Y can deviate from the predicted
value. The deviations between the actual and
predicted values are called errors,
or residuals.
PPT BY: MADHAV MISHRA 4
5. Let’s look at the method of least squares from another
perspective. Imagine that you’ve plotted some data
using a scatterplot, and that you fit a line for the
mean of Y through the data. Let’s lock this line in
place, and attach springs between the data points and
the line.
Some of the data points are further from the mean
line, so these springs are stretched more than others.
The springs that are stretched the furthest exert the
greatest force on the line.
What if we unlock this mean line, and let it rotate
freely around the mean of Y? The forces on the
springs balance, rotating the line. The line rotates
until the overall force on the line is minimized.
The are some cool physics at play, involving the
relationship between force and the energy needed to
pull a spring a given distance. It turns out that
minimizing the overall energy in the springs is
equivalent to fitting a regression line using the
method of least squares.
PPT BY: MADHAV MISHRA 5
7. Multivariate Regression is one of the simplest Machine Learning Algorithm. It
comes under the class of Supervised Learning Algorithms i.e, when we are
provided with training dataset.
Multivariate Regression is a method used to measure the degree at which more
than one independent variable (predictors) and more than one dependent variable
(responses), are linearly related.
The method is broadly used to predict the behavior of the response variables
associated to changes in the predictor variables, once a desired degree of relation
has been established.
This is quite similar to the simple linear regression model we have discussed
previously, but with multiple independent variables contributing to the dependent
variable and hence multiple coefficients to determine and complex computation
due to the added variables.
Jumping straight into the equation of multivariate linear regression,
PPT BY: MADHAV MISHRA 7
8. A researcher has collected data on three psychological variables, four academic
variables (standardized test scores), and the type of educational program the
student is in for 600 high school students. She is interested in how the set of
psychological variables is related to the academic variables and the type of
program the student is in.
A doctor has collected data on cholesterol, blood pressure, and weight. She also
collected data on the eating habits of the subjects (e.g., how many ounces of red
meat, fish, dairy products, and chocolate consumed per week). She wants to
investigate the relationship between the three measures of health and eating
habits.
A property dealer wants to set housing prices which are based various factors like
Size of house, No of bedrooms, Age of house, etc.
Note:
Multiple Regression: The Multiple Regression model, relates more than one
predictor and one response.
Multivariate Regression: The Multivariate Regression model, relates more than
one predictor and more than one response.
PPT BY: MADHAV MISHRA 8
10. Regularization is used as a solution to get rid out of the overfitting problem in
multivariate regression, but it can be used in both univariate and multivariate
regression.
In general, regularization means to make things regular or acceptable.
In the context of machine learning, regularization is the process which regularizes or
shrinks the coefficients towards zero and in simple words, regularization discourages
learning a more complex or flexible model, to prevent overfitting.
How Does Regularization Work?
The basic idea is to penalize the complex models i.e. adding a complexity term that
would give a bigger loss for complex models. To understand it, let’s consider a simple
relation for linear regression. Mathematically, it is stated as below:
Y≈ W_0+ W_1 X_1+ W_2 X_(2 )+⋯+W_P X_P
Where Y is the value to be predicted,
X_1,X_(2 ),〖…,X〗_P , are the features deciding the value of Y.
W_1,W_(2 ),〖…,W〗_P , are the weights attached to the features X_1,X_(2 ),〖…,X〗_P
respectively.
W_0 represents the bias. PPT BY: MADHAV
MISHRA 10
11. Regularization keeps all the features in Multivariate regression but reduces
magnitude values of parameters θj
( θ mean weight of your function )
Cost Function : It is a measure that measures the performance of a ML model for
a given data.
It qualifies error between predicted and expected values present in the form of
single real number.
Depending upon the problem the cost function can be formed in many different
ways.
Now, in order to fit a model that accurately predicts the value of Y, we require a
loss function and optimized parameters i.e. bias and weights.
The loss function generally used for linear regression is called the residual sum of
squares (RSS). According to the above stated linear regression relation, it can be
given as:
/
PPT BY: MADHAV MISHRA 11
12. Regularization Techniques
There are two main regularization techniques, namely Ridge Regression and
Lasso Regression. They both differ in the way they assign a penalty to the
coefficients.
They are also known as L1 (Lasso Regression) and L2 (Ridge Regression)
Ridge Regression (L2)
Ridge Regression is a technique which comes into picture when the data suffers
from Multicollinearity (which simple means that independent variables are highly
correlated).
In Multicollinearity concept, even though the least square estimates are unbiased,
their variance are large which in turn return results in the deviation of the
observed value far from the true values.
VALUE OBSERVED VALUE far from TRUE VALUE
Observed values – predicted value
True value – actual value
PPT BY: MADHAV MISHRA 12
13. By adding a degree of bias to the regression estimates, ridge regression is able to
reduce the standard error.
So Linear Regression:
Y = a + b * X
By adding an error term (degree of bias)
Y = a + b * X + e
(error term – it is the value needed to correct prediction error between the observed
& predicted value)
Y = a+b1X1+b2X2+……+ e
In linear equation, it is possible to solve prediction error into sub components.
1st component – due to bias
2nd component – due to variance
Prediction error mostly occurs due to any one of these two or both component.
PPT BY: MADHAV MISHRA 13
14. Ridge Regression solves the multicollinearity problem through shrinkage (lambda).
Here we have two components, First one is least square term & another is lambda of
the summation of β2 (beta square)
β is coefficient , is added to the least square term in order to shrink the parameter to
have a very low variance
Important Terms:
It shrinks the value of coefficient but does not reach zero.
This regularization is called L2 Regularization.
PPT BY: MADHAV MISHRA 14
15. It stands for least absolute shrinkage and selection operator.
Lasso Regression is a type of linear regression that uses shrinkage, where data
values are shrunk towards a central point, like the mean.
The lasso procedure encourages simple, sparse models.
This is well suited for models showing high level of multicollinearity or when you
want to automate certain parts of models selection, like variable selection or
elimination.
Lasso was introduced in order to improve the prediction accuracy and
interpretability of regression models.
This is done by taking only a subset of provided covariates for use in the final
model rather than using all of them.
Lasso is an alternative to avoid many problems of overfitting in model.
PPT BY: MADHAV MISHRA 15
16. Lasso regression performs L1 regularization which adds a factor of sum of absolute
values of coefficients in the optimization objective.
Where RSS stands for Least Squares Objective which is nothing but the linear
regression objective without regularization and λ is the turning factor that controls the
amount of regularization. The bias will increase with the increasing value of λ and the
variance will decrease as the amount of shrinkage (λ) increases.
Here the turning factor λ controls the strength of penalty, that is
When λ = 0: We get same coefficients as simple linear regression
When λ = ∞: All coefficients are zero
When 0 < λ < ∞: We get coefficients between 0 and that of simple linear regression
PPT BY: MADHAV MISHRA 16
17. Why Do You Need to Apply a Regularization Technique?
Often, the linear regression model comprising of a large number of features suffers
from some of the following:
Overfitting: Overfitting results in the model failing to generalize on the unseen
dataset.
Multicollinearity: Model suffering from multicollinearity effect.
Computationally Intensive: A model becomes computationally intensive.
When Do You Need to Apply Regularization Techniques?
Once the regression model is built and one of the following symptoms happen, you
could apply one of the regularization techniques.
Model lack of generalization: Model found with higher accuracy fails to generalize
on unseen or new data.
Model instability: Different regression models can be created with different
accuracies. It becomes difficult to select one of them.
PPT BY: MADHAV MISHRA 17
18. Bias & variance are ways of measuring the difference between your prediction and
actual outcome.
Bias is called as error. It is useful to quantify how much on an average are the
predicted values different from the actual values.
(gap between your predicted value and the actual value or outcome)
Variance helps to quantify how are the prediction made on some observation
different from each other.
(when your predicted values are scattered all over the places)
A high bias error in a model results to have a under performing model, which
keeps on missing important trends.
A high variance model will overfit on your training population and perform badly
on any observation beyond training
PPT BY: MADHAV MISHRA 18
19. High Bias / High Variance - Consistently
wrong is an inconsistent way.
High Bias / Low Variance - Consistently
wrong.
Low Bias / High Variance - One bulls
target.
High Bias can led to missing of the
relevance data or feature needed for the
target value in other words leads to
underfitting.
High Variance can lead to generation of
random noise in the training data and
can deviate the output that leads to
overfitting.
In order to have perfect fit in the model,
the bias & variance should be balanced.
PPT BY: MADHAV MISHRA 19
20. The following bulls-eye diagram explains the tradeoff better:
The center i.e. the bull’s eye is the model result we want to
achieve that perfectly predicts all the values correctly.
As we move away from the bull’s eye, our model starts to make
more and more wrong predictions.
A model with low bias and high variance predicts points that
are around the center generally, but pretty far away from each
other.
A model with high bias and low variance is pretty far away
from the bull’s eye, but since the variance is low, the predicted
points are closer to each other.
we learned that an ideal model would be one where both the
bias error and the variance error are low. However, we should
always aim for a model where the model score for the training
data is as close as possible to the model score for the testing
data.
That’s where we figured out how to choose a model that is not
too complex (High variance and low bias) which would lead to
overfitting and nor too simple(High Bias and low variance)
which would lead to underfitting.
Bias and Variance plays an important role in deciding which
predictive model to use. PPT BY: MADHAV
MISHRA
20
21. In ML during classification , we get many cases when we cross “N” no of
dimensions or features/ parameters/ attributes
The motivation behind dimensionality reduction is to cut down (remove/
eliminate) unwanted dimensions or features which will finally classify the dataset
into correct class.
Dimensionality reduction can also be referred as the process of converting a set of
data having base dimensions into data with lesser dimension ensuring that it
provides the same or similar information.
Let’s understand with example, if we have say 2 dimensions X1 and X2.
Which tells us the measurements of several objects in cm(X1) & inches(X2).
Now if we use both these dimensions in machine learning, they will convey similar
information & introduce a lot of noise in system. So better to use one dimension in
place of two.
We then convert the dimension of data 2D (from X1 & X2) to 1D(Z1).
PPT BY: MADHAV MISHRA 21
22. Process of Dimensionality reduction can be divided into mainly 2 types.
Feature Selection & Feature Extraction.
Methods: Dimensionality reduction techniques
1. Missing Value Ratio: dataset has Attributes/Columns has many missing values
which is not useful feature.
2. Low Variance: We compare feature to feature and see the value and difference. The
value and difference that are minimum or has minimum difference is been removed.
3. High Correlation Filter: Here if a one feature is contributing an information and at
the same time another feature is contributing the same information then we see
high correlation between both the feature, since the information derived by both the
features are same we tend to remove one feature amongst both.
4. PCA: Also known as principal component analysis, also they are orthogonal in
nature. It is like a tool that can be used to reduce a large set of variables to a small
set that still contains most of the information that the large set had. Mathematical
procedures that transforms a number of possible or correlated variables into a
smaller number of uncorrelated variable called as PCA.
5. Back feature elimination: here we have number of feature , say (n) feature so once
we train the model by using back feature elimination method we tend to train it as
(n) train, (n-1) train, (n-2) train. We train all the feature and check the error rate for
each feature. If the error rate is less we keep the feature for model building going
ahead, but if the error rate is increasing then we remove the feature.
6. Forward feature elimination: here we first create a empty list of feature, and then
add feature that has list that has less error using the mechanism (n) train,
(n+1)train, (n+2) train and so on.
PPT BY: MADHAV MISHRA 22
23. Logistic Regression is used for a different class of problems known as
classification problem.
Here the aim is to predict the group which any current object under observation
belongs to.
It gives you a discrete binary outcome between 0 & 1.
A simple example would be whether a person will vote or not in a upcoming
elections.
How does it works?
Logistic regression measures the relationship between the dependent variable
(our label, what we want to predict) and the one or more independent variables
(our features), by estimating probabilities using its underlying logistic function.
It uses sigmoid function which is given as
PPT BY: MADHAV MISHRA 23
24. The sigmoid function is an S- Shaped Curve that can take any real valued number
and map it into a value between the range of 0 to 1, but never exactly at those
limits.
Making Predictions:
These probabilities must then be transformed into binary values in order to actually
make a prediction.
This is the task of the logistic function also called a sigmoid function.
This values between 0 & 1 will then be transformed into either 0 or 1 using a
threshold classifier.
Logistic Vs Linear:
Logistic regression gives you a discrete outcome but Linear regression gives you a
continuous outcome
PPT BY: MADHAV MISHRA 24
26. Time Series Analysis also known as TSA.
TSA consist of method used to analyse various data facts or statistics from various
characteristics of the data.
TSA used for continuous data, for example economic growth of an organization,
share price, sales temperature, weather etc.
TSA model has time ‘t’ as an independent variable & the target is a dependent
variable denoted by Yt
The output from the time series model is a a predicted value of y at the given time
t.
Time Series is the process of recording of the data at regular interval of time.
TSA Components:
TRENDS , CYCLES, SEASONALITY
PPT BY: MADHAV MISHRA 26
27. Trends:
Considered to behaviour of the feature at a particular amount of time, it can be
categorized as increasing trend, decreasing trend or constant trend.
Seasonality:
Pattern which repeats at the constant frequency. Example here the demand for the
umbrella will be at peak in the rainy season.
Cycles:
They are the type of seasonality pattern but it doesn’t repeat at regular frequency.
Cycles can be generally considered as the task completion time.
Example: Iterative model of S/W Engineering, every iteration can have different
time requirement. But here every task has to undergo all stage in a single iteration.
Most widely used time series analysis is Autoregressive Moving Average (ARMA),
Which has two parts in them (AR) Autoregressive and (MA) Moving Average.
PPT BY: MADHAV MISHRA 27
28. The Process of making prediction of future based on the present and the past data
most commonly by using analysis of trends is called as forecasting.
Steps for forecasting:
1. Define the goal or business object.
2. Get the required data.
3. Explores & Visualize the series.
4. Pre- process the data.
5. Partition the series.
6. Apply suitable forecasting model(ARMA Model)
7. Evaluate & compare the performance of the system.
8. Implement the final forecasting system.
PPT BY: MADHAV MISHRA 28
29. What is a neural network?
A neural network is formed when a collection of nodes or neurons are interlinked
through synaptic connections.
There are three layers in every artificial neural network – input layer, hidden layer,
and output layer.
The input layer that is formed from a collection of several nodes or neurons receives
inputs.
Every neuron in the network has a function, and every connection has a weight
value associated with it.
Inputs then move from the input layer to layer made from a separate set of neurons
– the hidden layer. The output layer gives the final outputs.
PPT BY: MADHAV MISHRA 29
30. Perceptron
A perceptron is a neural network unit (an artificial neuron)
that does certain computations to detect features or
business intelligence in the input data.
A perceptron, a neuron’s computational prototype, is
categorized as the simplest form of a neural network.
Frank Rosenblatt invented the perceptron at the Cornell
Aeronautical Laboratory in 1957.
A perceptron has one or more than one inputs, a process,
and only one output.
The concept of perceptron has a critical role in machine
learning.
It is used as an algorithm or a linear classifier to facilitate
supervised learning of binary classifiers.
Supervised learning is amongst the most researched of
learning problems.
A supervised learning sample always consists of an input
and a correct/explicit output.
The objective of this learning problem is to use data with
correct labels for making predictions on future data, for
training a model.
Some of the common problems of supervised learning
include classification to predict class labels.
PPT BY: MADHAV MISHRA 30
31. A linear classifier that the perceptron is categorized as is a classification
algorithm, which relies on a linear predictor function to make predictions.
Its predictions are based on a combination that includes weights and feature
vector.
The linear classifier suggests two categories for the classification of training data.
This means, if classification is done for two categories, then the entire training
data will fall under these two categories.
The perceptron algorithm, in its most basic form, finds its use in the binary
classification of data.
Perceptron takes its name from the basic unit of a neuron, which also goes by the
same name.
PPT BY: MADHAV MISHRA 31
32. There are two types of Perceptrons:
Single layer and Multilayer.
Single layer Perceptrons can learn only
linearly separable patterns.
Multilayer Perceptrons or feedforward
neural networks with two or more layers
have the greater processing power.
The Perceptron algorithm learns the
weights for the input signals in order to
draw a linear decision boundary.
This enables you to distinguish between
the two linearly separable classes +1 and
-1.
PPT BY: MADHAV MISHRA 32
33. Perceptron Learning Rule
states that the algorithm
would automatically learn
the optimal weight
coefficients.
The input features are then
multiplied with these
weights to determine if a
neuron fires or not.
The Perceptron receives
multiple input signals, and if
the sum of the input signals
exceeds a certain threshold,
it either outputs a signal or
does not return an output.
In the context of supervised
learning and classification,
this can then be used to
predict the class of a sample.
PPT BY: MADHAV MISHRA 33
34. Perceptron is a function that maps its input “x,” which is multiplied with the learned
weight coefficient; an output value ”f(x)”is generated.
In the equation given above:
“w” = vector of real-valued weights
“b” = bias (an element that adjusts the boundary away from origin without any
dependence on the input value)
“x” = vector of input x values
“m” = number of inputs to the Perceptron
The output can be represented as “1” or “0”. It can also be represented as “1” or “-1”
depending on which activation function is used.
PPT BY: MADHAV MISHRA 34
35. A Perceptron accepts inputs, moderates them with certain weight values, then
applies the transformation function to output the final result.
The above below shows a Perceptron with a Boolean output.
A Boolean output is based on inputs such as salaried, married, age, past credit
profile, etc. It has only two values: Yes and No or True and False.
The summation function “∑” multiplies all inputs of “x” by weights “w” and then
adds them up as follows:
PPT BY: MADHAV MISHRA 35
36. The activation function applies a step rule (convert the numerical output into +1
or -1) to check if the output of the weighting function is greater than zero or not.
Step function gets triggered above a certain value of the neuron output; else it
outputs zero.
Sign Function outputs +1 or -1 depending on whether neuron output is greater
than zero or not.
Sigmoid is the S-curve and outputs a value between 0 and 1.
PPT BY: MADHAV MISHRA 36
37. Steps to perform a perceptron learning algorithm
1. Feed the features of the model that is required to be trained as input in the first
layer.
2. All weights and inputs will be multiplied – the multiplied result of each weight
and input will be added up
3. The Bias value will be added to shift the output function
4. This value will be presented to the activation function (the type of activation
function will depend on the need)
5. The value received after the last step is the output value.
PPT BY: MADHAV MISHRA 37
38. Support Vector Machine” (SVM) is a
supervised machine learning
algorithm which can be used for both
classification or regression challenges.
However, it is mostly used in
classification problems.
In the SVM algorithm, we plot each data
item as a point in n-dimensional space
(where n is number of features you have)
with the value of each feature being the
value of a particular coordinate.
Then, we perform classification by
finding the hyper-plane that
differentiates the two classes very well.
Support Vectors are simply the co-
ordinates of individual observation.
The SVM classifier is a frontier
which best segregates the two classes
(hyper-plane/ line).
PPT BY: MADHAV MISHRA 38
39. Hyperparameters of the Support Vector Machine (SVM)
Algorithm
There are a few important parameters of SVM that you
should be aware of before proceeding further:
Kernel: A kernel helps us find a hyperplane in the higher
dimensional space without increasing the computational
cost. Usually, the computational cost will increase if the
dimension of the data increases. This increase in dimension
is required when we are unable to find a separating
hyperplane in a given dimension and are required to move
in a higher dimension(mentioned in the picture).
Hyperplane: This is basically a separating line between two
data classes in SVM. But in Support Vector Regression, this
is the line that will be used to predict the continuous output
Decision Boundary: A decision boundary can be thought of
as a demarcation line (for simplification) on one side of
which lie positive examples and on the other side lie the
negative examples. On this very line, the examples may be
classified as either positive or negative. This same concept of
SVM will be applied in Support Vector Regression as well
PPT BY: MADHAV MISHRA 39
40. How does it work?
Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B
and C). Now, identify the right hyper-plane to classify star and circle. You need to
remember a thumb rule to identify the right hyper-plane: “Select the hyper-plane
which segregates the two classes better”. In this scenario, hyper-plane “B”
has excellently performed this job.
PPT BY: MADHAV MISHRA 40
41. Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B
and C) and all are segregating the classes well. Now, How can we identify the right
hyper-plane?
Here, maximizing the distances between nearest data point (either class) and
hyper-plane will help us to decide the right hyper-plane. This distance is called
as Margin.
Above, you can see that the margin for hyper-plane C is high as compared to both A
and B. Hence, we name the right hyper-plane as C. Another lightning reason for
selecting the hyper-plane with higher margin is robustness. If we select a hyper-
plane having low margin then there is high chance of miss-classification.
PPT BY: MADHAV MISHRA 41
42. Identify the right hyper-plane (Scenario-3):Hint: Use the rules as discussed in
previous section to identify the right hyper-plane
Some of you may have selected the hyper-plane B as it has higher margin
compared to A. But, here is the catch, SVM selects the hyper-plane which
classifies the classes accurately prior to maximizing margin. Here, hyper-plane B
has a classification error and A has classified all correctly. Therefore, the right
hyper-plane is A.
PPT BY: MADHAV MISHRA 42
43. Can we classify two classes
(Scenario-4)?:
Below, I am unable to segregate the
two classes using a straight line, as one
of the stars lies in the territory of
other(circle) class as an outlier.
As I have already mentioned, one star
at other end is like an outlier for star
class. The SVM algorithm has a feature
to ignore outliers and find the hyper-
plane that has the maximum margin.
Hence, we can say, SVM classification is
robust to outliers.
PPT BY: MADHAV MISHRA 43
44. Find the hyper-plane to segregate to classes
(Scenario-5):
In the scenario below, we can’t have linear hyper-
plane between the two classes, so how does SVM
classify these two classes? Till now, we have only
looked at the linear hyper-plane.
SVM can solve this problem. Easily! It solves this
problem by introducing additional feature. Here,
we will add a new feature z=x^2+y^2. Now, let’s
plot the data points on axis x and z.
In the final plot, points to consider are:
All values for z would be positive always because
z is the squared sum of both x and y
In the original plot, red circles appear close to
the origin of x and y axes, leading to lower value
of z and star relatively away from the origin
result to higher value of z.
PPT BY: MADHAV MISHRA 44
45. What Soft Margin does is:
The soft margin SVM gives more flexibility by
allowing some of the training points to be
misclassified.
It tolerates a few dots to get misclassified
It tries to balance the trade-off between finding
a line that maximizes the margin and
minimizes the misclassification.
Two types of misclassifications can happen:
1. The dot is on the wrong side of the decision
boundary but on the correct side/ on the margin
(shown in left)
2. The dot is on the wrong side of the decision
boundary and on the wrong side of the margin
(shown in right)
Either case, the support vector machine
tolerates those dots to be misclassified when it
tries to find the linear decision boundary.
PPT BY: MADHAV MISHRA 45
46. What is Gradient Descent?
It is an optimization algorithm used to find the
values of parameters .i.e. coefficients of a function
(f) that minimizes a cost function (cost).
It is defined as First-order iterative optimization
algorithm for finding the minimum of a loss
function.
It is also one of the most popular and widely used
optimization algorithm.
Given a machine learning model with parameters
(weights and biases) and a cost function to see
how good a model is, our learning problem reduces
to find a good set of weights for our model which
minimizes the cost function.
(Cost Function : It is a measure that measures the
performance of a ML model for a given data.)
(Learning Problem: It is a decision problem that
needs to be modelled from data)
PPT BY: MADHAV MISHRA 46
47. Gradient descent is an iterative method.
So we start some value for our model parameters (weights and biases), and
improve them slowly.
To improve a set of weights, we try to get a sense in terms of the value of the cost
function for weights similar to the current weights (by calculating the gradient)
and move in the direction in which the cost function reduces (decreases or is
negative).
So standing on an iterative methodology we tend to repeat this step thousands of
times.
Hence by this we’ll minimize our cost function by the above explained iterative
process.
Let’s try to know the Equations and formulas in it:
Gradient descent is used to minimize a cost function J(w) which is parameterized
alongside by a model parameters w. The gradient (or derivative) shows us the
incline or slope of the cost function. So to minimize the cost function, we move in
the direction opposite to the gradient.
Let G be the gradient of the cost function with respect to the parameters at a
particular value w of the weight vector. That is,
PPT BY: MADHAV MISHRA 47
48. Thereafter, the gradient descent step is given by
η = learning rate that determines the size of the steps which is taken to reach a
minimum
Note : Here we just need to be careful about this parameter i.e. high values of η
may go past the global minimum & then the low value will reach minimum slowly.
PPT BY: MADHAV MISHRA 48
49. Steps to perform Gradient Descent:
Step 1. Initialize the weights w randomly
Step 2. Calculate the gradients G of cost function w.r.t parameters
Step 3. Update the weights by an amount proportional to G, i.e. w = w -ηG
Step 4. Repeat until J(w) stops reducing or other pre-defined termination criteria is
met
PPT BY: MADHAV MISHRA 49
50. Let’s think little & understand more with below
example
Imagine you’re blind folded riding the car, and your
objective is to reach the lowest altitude.
One of the simplest strategies you can use, is to the
tyre wheel on the ground will move only in the
downward direction on the slope downwards, and
considering that it is taking a step in the direction
where the ground is descending the fastest (the tyre
wheel moves fast on that direction).
If you keep repeating this process, you might slide up
& down, and land up somewhere in the minimum side
of the valley.
The riding car is analogous to the cost function.
Minimizing the cost function is analogous on trying to
reach the lower altitudes.
Feeling the slope by the cars wheel around is
analogous to calculating the gradient, and taking a
step and moving the car on slope is analogous to one
iteration for the parameter.
PPT BY: MADHAV MISHRA 50
51. Finally let’s see the multiple variants of Gradient Descent
It consists of multiple variants which are used depending on the amount of data
which is being used to calculate the gradient.
Reason for this variation is the computational efficiency of the models because
they can have many (million) data points in a datasets.
So in this calculating entire dataset is very expensive. So it is divided as the
Batch gradient descent, Stochastic gradient descent & Mini-Batch Gradient
descent.
Batch gradient descent
It computes the gradient of the cost function w.r.t to parameter w for entire training
data.
As we need to calculate the gradients for the entire dataset to perform one basic
parameter update.
Hence batch gradient descent can be very slow as of it computational efficiency.
PPT BY: MADHAV MISHRA 51
52. Stochastic gradient descent
Here it computes the gradient for each training sample (xi) i.e. a single training
data point is used for each update.
Mini-Batch gradient descent
Here we calculate the gradient for each small mini-batch of training data.
We perform it as:
First divide the training data into small batches (say M samples / batch) then we
perform one update per mini-batch. M is usually in the range 30–500, depending on
the problem.
Amongst all of these mini-batch & Stochastic Gradient Descent are most popular.
Here mini-batch is used for computing infrastructure which can be compliers or
CPUs
PPT BY: MADHAV MISHRA 52