SlideShare a Scribd company logo
1 of 66
Linear algebra
Why linear algebra is useful
In many machine learning and deep learning
algorithms the input and output are both
represented as vectors
By vectors simple means a collection of
numbers
Convert a input ( such as picture, sound etc )
into a number
Cont..
Scalars: single number or quantity which has
an important property magnitude
Ex: speed of a car ( speed = 45 km/hr)
we write scalars in italics
usually we denote with lower case variable
names
Ex: n ∈ N be a number of units
Cont..
Vectors: A vector is an array of numbers
Ex: let be an input vector
X1, X2,….Xn put together to concatenate called it is
as vector
Cont..
Vectors: Magnitude and direction
Ex: A car is travelling east at the speed of 45 km/hr
Cont..
Matrices : A matrix is a 2-D array of numbers
Cont..
Tensors: Array of numbers with dimensions
greater than 2
multi dimensional array
Cont...
Multiplying matrices and vectors:
To define multiplication between matrix A and vector x we
need to view the vector as a column matrix.
Cont...
Cont..
Span and Linear dependence
span : the span of a set of vector is the set of all vectors
obtained by a linear combination of original vectors.
every possible linear combination
ex: The span of the coordinate vectors v1=(1,0) v2=(0,1)
Cont...
Identity and inverse matrices:
Cont..
Cont..
Cont...
Linear dependence: A set of vectors is linearly
dependent if the vectors can be written as a linear
combination of the other vectors .
ex: v1=(1,0) v2=(0,1) v3=(2,1)
v3 can be represented as linear combination of 2V1+ V2
Cont..
Norms: Norms are a way of measuring the
length of vectors, matrices etc
To estimate how big a vector or tensor
To estimate how close one vector or tensor is
to another
Cont..
Cont..
Mathematically a norm is any function f that
satisfies
Cont...
Eigen decomposition :
One of the most widly used matrix decomposition is
Eigen decomposition
In which we decompose a matrix into a set of eigen
vectors and eigen values
A vector that undergoes pure scaling without any
rotation is know as eigen vector
The scaling factor (stretch ratio) is known as eigen value.
Con..
Cont...
.
Cont..
Singular value Decomposition
A SVD is derived from Eigen decomposition
The matrix A can be an mXn matrix which
does not have to be square matrix.
Cont..
Cont..
The Moore-penrose pesudo inverse:
It is used when the matrix may not be
invertible.
If A is invertible, then the moore-penrose is
equal to matrix inverse
The pseudo inverse is also referred to as the
generalized inverse.
Cont..
Let A be a matrix of order m x n then the
pseudo inverse of A is defined as
If the column of a matrix A are linearly
independent then the pseudo inverse of A is
A+ =(AT A-1) AT
If the rows of the matrix are linearly
independent then pseudo inverse of A is
A+ = AT(AAT) -1
Cont..
The Trace operator:
The trace operator gives the sum of all of the
diagonal entries of a matrix
Cont..
The Determinant
The determinant of a square matrix denoted
det(A)
Cont..
The Trace operator:
The trace operator gives the sum of all of the
diagonal entries of a matrix
Cont..
Principle Component Analysis:
It is a dimensionality reduction method that is
often used to reduce the dimensionality of large
datasets by transforming a large set of
variables into smaller one that still contains
most of the information in the large set.
Cont..
Cont..
Cont..
Probability and Information Theory
Random variables:
A variable whose value is determined by a
random experiment
It is defined over the sample space to real
number
X : S -> R
X is random variable, S is sample space, R is
real numbers
Sample space is a collection or a set of
possible outcomes of a random experiment
Cont..
Types of random variables
Discrete random variables: It takes only finite
number of distinct values
Ex: 0, 1,2,3,4....
Continuous random variables:
Infinite and uncountable set of values
Ex: Interest rates of loans in a country
Probability and Information Theory
Probability Distributions
It gives the possibility of each outcome of
random experiment or event
Probability and Information Theory
Probability Mass function:
If X is a discrete random variable with distinct
values x1, x2, .... xn then the function p(x)
defined as
Is called the probability mass function
Probability and Information Theory
Probability density function:
A function f(x) is PDF if
Probability and Information Theory
Marginal Probability:
The probability of an event irrespective of the
outcome of another variable
It is simply the distribution of each of these
individual variables
Probability and Information Theory
For example, we would say that the marginal
distribution of sports is:
Baseball: 36
Basketball: 31
Football: 33
Probability and Information Theory
We could also write the marginal distribution of
sports in percentage terms (i.e. out of the total
of 100 respondents):
Baseball: 36 / 100 = 36%
Basketball: 31 / 100 = 31%
Football: 33 / 100 = 33%
Probability and Information Theory
Conditional Probability:
The probability of occurrence of any event A
when another event B in relation to A has
already occurred
Probability and Information Theory
The chain rule of conditional probability
for three events A,B,C we have chain rule
The chain rule can be generalized for ‘n’
number of events A1, A2, A3..... An,
Probability and Information Theory
Independence and conditional Independence
Two random variables X and Y are said to be
statistically independent if and only if
P(X,Y) = P(X)P(Y)
Ex: Independent – X: Throw of a dice
Y: Toss of a coin
Not independent : X: Height Y: Weight
In general as height increases weight
increases
Probability and Information Theory
Independence is equivalent to saying
P(y/x) = P(y) or P(x/y) = P(x)
The dependence on y on x whether y happens
or not has no relation on whether x happens
Probability and Information Theory
Conditional Independence
Two random variable X and Y are said to be
independent given Z if and only if
P(x,y / z) =P(x/z) P(y/z)
Independence and conditional independent
X: Throw of a dice
Y: Toss of a coin
Z: Card from deck
Probability and Information Theory
X: Height Y: Vocabulary Z: Age
Not independent unless i gave some condition
( if i say this person is just 2 feet tall, it
automatically means it is most probably this
person must be child and have low vocabulary
Unless i have some condition they are not
independent
In such case X and Y are not really
independent variables. They are only
independent suppose i gave a particular
condition
Probability and Information Theory
Suppose if I fix age at 30 people of age of 30
regardless of height will have vocabularies.
They donot at least depend on height
The above case is two variables are not
independent but they are conditionally
independent
Probability and Information Theory
Expectation:
It gives mean/average/expected value of the
random variable given the distribution
Distribution of an event consists not only of the
input of the values that can be observed but is
made up of all possible values
Ex: Expected returns on a certain investment in
the market
Expected rainfall during coming monsoon
Probability and Information Theory
The expectation or expected value of some
function f(x) with respect to a probability
distribution P(x) is the average of f(x) when x is
drawn from p
Denoted by Ex~p [f(x)]
Ex~p [f(x)] = ∑ P(x) f(x)
x
Probability and Information Theory
Multivariate Expectation: (multiple variables)
For a multivariate random variable x (vector) we
can interpret the variable by considering
Probability and Information Theory
Linearity of Expectation:
If ‘f’ is a linear combination of two other
functions g and h, α, β are scalars
f(x) = α g(x) = βh(x) then
Expectation of f is
E[f] = α E[g] + β E[h]
Probability and Information Theory
Variance :
It gives the variation from the expected value
Variance also measures amount of fluctuation
of the variable
Ex: Variance is returns on certain investment in
the market ( risk measure)
Probability and Information Theory
Probability and Information Theory
CoVariance :
This is for a pair of variables x and y
Measures the total variation of two random
variables from their expected values
Probability and Information Theory
Useful properties of common functions
Logistic sigmoid function
Sigmoid function is mathematical function
having a characteristic ‘S’ shaped curve
Whatever input we give to sigmoid function it
give the output 0 and 1. (0,1)
Probability and Information Theory
Soft plus function:
The out put produced by sigmoid function have
upper limits and lower limits where as softplus
function produces output in (0, +∞ )
It is smooth approximation function can be used
to constrain output of a machine to always
positive
Probability and Information Theory
Soft plus function:
The out put produced by sigmoid function have
upper limits and lower limits where as softplus
function produces output in (0, +∞ )
It is smooth approximation function can be used
to constrain output of a machine to always
positive
Probability and Information Theory
Bayes theorem: It helps in determining the
probability of an event that is based on some
event that has already occurred.
If A and B are two events then the formula for
bayes theorem is given by
Where P(A/B) is the probability of condition
when event A is occurring while event B has
already occurred
Probability and Information Theory
From the definition of conditional probability
Bayers theorem can be derived for event as
Probability and Information Theory
Information theory is a mathematical approach
to the study of coding of information along with
the quantification, storage, and communication
of information.
It provides the quantitative measure of
information contained in message signal
Digital information are always associated with
uncertainty
If probability of occurrence of an event is very
high = information contained in that event will
be less
Probability and Information Theory
Ex: Tomorrow, the sun will rise from the east
If the probability of occurrence of an event is
low information contained in the event will be
more
Ex: Solar eclipse will occur today
Consider a discrete random variable X with
possible outcomes xi i=1,2,…n. The information
of event X=xi is defined as
Probability and Information Theory
Consider a source which tosses a fair coin
produces an output equal to 1 if a head
appears and 0 if a tail appears
P(1) =P(0) =0.5
The information content of each output from
the source is
1 bit
Probability and Information Theory
Entropy: it tells how much information present
in an event
We can measure the amount of uncertainty in
an entire probability distribution using Shannon
entropy
H(x) is total amount of information in an entire
probability distribution
Probability and Information Theory
kullback leibler divergence:
It is a measure of how one probability
distribution is different from second
Probability and Information Theory
Structured Probabilistic Models : It is
representation of the factorization of probability
distribution using graph
A way of describing probability distributions
using a graph to describe which variables
interact with each other directly
Directed: graphs with directed edges
Undirected: graphs with undirected edges
Probability and Information Theory
Directed graphs:
The probability distribution over x is given by
Probability and Information Theory
Undirected graphs:
Any set of nodes that are connected to each other is
called clique
Each clique in an undirected model is associated with a
factor of

More Related Content

Similar to DL-unit-1.pptx

Chi-squared Goodness of Fit Test Project Overview and.docx
Chi-squared Goodness of Fit Test Project  Overview and.docxChi-squared Goodness of Fit Test Project  Overview and.docx
Chi-squared Goodness of Fit Test Project Overview and.docxbissacr
 
Linear regreesion ppt
Linear regreesion pptLinear regreesion ppt
Linear regreesion pptAlivaLenka
 
Chi-squared Goodness of Fit Test Project Overview and.docx
Chi-squared Goodness of Fit Test Project  Overview and.docxChi-squared Goodness of Fit Test Project  Overview and.docx
Chi-squared Goodness of Fit Test Project Overview and.docxmccormicknadine86
 
Appendix 2 Probability And Statistics
Appendix 2  Probability And StatisticsAppendix 2  Probability And Statistics
Appendix 2 Probability And StatisticsSarah Morrow
 
Introduction to Statistics and Probability
Introduction to Statistics and ProbabilityIntroduction to Statistics and Probability
Introduction to Statistics and ProbabilityBhavana Singh
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).pptMuhammadAftab89
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.pptRidaIrfan10
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptkrunal soni
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptMoinPasha12
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Sciencessuser71ac73
 
2_GLMs_printable.pdf
2_GLMs_printable.pdf2_GLMs_printable.pdf
2_GLMs_printable.pdfElio Laureano
 
Point estimation.pptx
Point estimation.pptxPoint estimation.pptx
Point estimation.pptxDrNidhiSinha
 
Chapter-4 combined.pptx
Chapter-4 combined.pptxChapter-4 combined.pptx
Chapter-4 combined.pptxHamzaHaji6
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREShriramKargaonkar
 

Similar to DL-unit-1.pptx (20)

Chi-squared Goodness of Fit Test Project Overview and.docx
Chi-squared Goodness of Fit Test Project  Overview and.docxChi-squared Goodness of Fit Test Project  Overview and.docx
Chi-squared Goodness of Fit Test Project Overview and.docx
 
Linear regreesion ppt
Linear regreesion pptLinear regreesion ppt
Linear regreesion ppt
 
Chi-squared Goodness of Fit Test Project Overview and.docx
Chi-squared Goodness of Fit Test Project  Overview and.docxChi-squared Goodness of Fit Test Project  Overview and.docx
Chi-squared Goodness of Fit Test Project Overview and.docx
 
Appendix 2 Probability And Statistics
Appendix 2  Probability And StatisticsAppendix 2  Probability And Statistics
Appendix 2 Probability And Statistics
 
Introduction to Statistics and Probability
Introduction to Statistics and ProbabilityIntroduction to Statistics and Probability
Introduction to Statistics and Probability
 
ML-04.pdf
ML-04.pdfML-04.pdf
ML-04.pdf
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Guia de estudio para aa5
Guia de estudio  para aa5 Guia de estudio  para aa5
Guia de estudio para aa5
 
2_GLMs_printable.pdf
2_GLMs_printable.pdf2_GLMs_printable.pdf
2_GLMs_printable.pdf
 
Point estimation.pptx
Point estimation.pptxPoint estimation.pptx
Point estimation.pptx
 
Chapter-4 combined.pptx
Chapter-4 combined.pptxChapter-4 combined.pptx
Chapter-4 combined.pptx
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
DMV (1) (1).docx
DMV (1) (1).docxDMV (1) (1).docx
DMV (1) (1).docx
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
 

Recently uploaded

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

DL-unit-1.pptx

  • 2. Why linear algebra is useful In many machine learning and deep learning algorithms the input and output are both represented as vectors By vectors simple means a collection of numbers Convert a input ( such as picture, sound etc ) into a number
  • 3. Cont.. Scalars: single number or quantity which has an important property magnitude Ex: speed of a car ( speed = 45 km/hr) we write scalars in italics usually we denote with lower case variable names Ex: n ∈ N be a number of units
  • 4. Cont.. Vectors: A vector is an array of numbers Ex: let be an input vector X1, X2,….Xn put together to concatenate called it is as vector
  • 5. Cont.. Vectors: Magnitude and direction Ex: A car is travelling east at the speed of 45 km/hr
  • 6. Cont.. Matrices : A matrix is a 2-D array of numbers
  • 7. Cont.. Tensors: Array of numbers with dimensions greater than 2 multi dimensional array
  • 8. Cont... Multiplying matrices and vectors: To define multiplication between matrix A and vector x we need to view the vector as a column matrix.
  • 10. Cont.. Span and Linear dependence span : the span of a set of vector is the set of all vectors obtained by a linear combination of original vectors. every possible linear combination ex: The span of the coordinate vectors v1=(1,0) v2=(0,1)
  • 14. Cont... Linear dependence: A set of vectors is linearly dependent if the vectors can be written as a linear combination of the other vectors . ex: v1=(1,0) v2=(0,1) v3=(2,1) v3 can be represented as linear combination of 2V1+ V2
  • 15. Cont.. Norms: Norms are a way of measuring the length of vectors, matrices etc To estimate how big a vector or tensor To estimate how close one vector or tensor is to another
  • 17. Cont.. Mathematically a norm is any function f that satisfies
  • 18. Cont... Eigen decomposition : One of the most widly used matrix decomposition is Eigen decomposition In which we decompose a matrix into a set of eigen vectors and eigen values A vector that undergoes pure scaling without any rotation is know as eigen vector The scaling factor (stretch ratio) is known as eigen value.
  • 19. Con..
  • 21. Cont.. Singular value Decomposition A SVD is derived from Eigen decomposition The matrix A can be an mXn matrix which does not have to be square matrix.
  • 23. Cont.. The Moore-penrose pesudo inverse: It is used when the matrix may not be invertible. If A is invertible, then the moore-penrose is equal to matrix inverse The pseudo inverse is also referred to as the generalized inverse.
  • 24. Cont.. Let A be a matrix of order m x n then the pseudo inverse of A is defined as If the column of a matrix A are linearly independent then the pseudo inverse of A is A+ =(AT A-1) AT If the rows of the matrix are linearly independent then pseudo inverse of A is A+ = AT(AAT) -1
  • 25. Cont.. The Trace operator: The trace operator gives the sum of all of the diagonal entries of a matrix
  • 26. Cont.. The Determinant The determinant of a square matrix denoted det(A)
  • 27. Cont.. The Trace operator: The trace operator gives the sum of all of the diagonal entries of a matrix
  • 28. Cont.. Principle Component Analysis: It is a dimensionality reduction method that is often used to reduce the dimensionality of large datasets by transforming a large set of variables into smaller one that still contains most of the information in the large set.
  • 32. Probability and Information Theory Random variables: A variable whose value is determined by a random experiment It is defined over the sample space to real number X : S -> R X is random variable, S is sample space, R is real numbers Sample space is a collection or a set of possible outcomes of a random experiment
  • 33. Cont.. Types of random variables Discrete random variables: It takes only finite number of distinct values Ex: 0, 1,2,3,4.... Continuous random variables: Infinite and uncountable set of values Ex: Interest rates of loans in a country
  • 34. Probability and Information Theory Probability Distributions It gives the possibility of each outcome of random experiment or event
  • 35. Probability and Information Theory Probability Mass function: If X is a discrete random variable with distinct values x1, x2, .... xn then the function p(x) defined as Is called the probability mass function
  • 36. Probability and Information Theory Probability density function: A function f(x) is PDF if
  • 37. Probability and Information Theory Marginal Probability: The probability of an event irrespective of the outcome of another variable It is simply the distribution of each of these individual variables
  • 38. Probability and Information Theory For example, we would say that the marginal distribution of sports is: Baseball: 36 Basketball: 31 Football: 33
  • 39. Probability and Information Theory We could also write the marginal distribution of sports in percentage terms (i.e. out of the total of 100 respondents): Baseball: 36 / 100 = 36% Basketball: 31 / 100 = 31% Football: 33 / 100 = 33%
  • 40. Probability and Information Theory Conditional Probability: The probability of occurrence of any event A when another event B in relation to A has already occurred
  • 41. Probability and Information Theory The chain rule of conditional probability for three events A,B,C we have chain rule The chain rule can be generalized for ‘n’ number of events A1, A2, A3..... An,
  • 42. Probability and Information Theory Independence and conditional Independence Two random variables X and Y are said to be statistically independent if and only if P(X,Y) = P(X)P(Y) Ex: Independent – X: Throw of a dice Y: Toss of a coin Not independent : X: Height Y: Weight In general as height increases weight increases
  • 43. Probability and Information Theory Independence is equivalent to saying P(y/x) = P(y) or P(x/y) = P(x) The dependence on y on x whether y happens or not has no relation on whether x happens
  • 44. Probability and Information Theory Conditional Independence Two random variable X and Y are said to be independent given Z if and only if P(x,y / z) =P(x/z) P(y/z) Independence and conditional independent X: Throw of a dice Y: Toss of a coin Z: Card from deck
  • 45. Probability and Information Theory X: Height Y: Vocabulary Z: Age Not independent unless i gave some condition ( if i say this person is just 2 feet tall, it automatically means it is most probably this person must be child and have low vocabulary Unless i have some condition they are not independent In such case X and Y are not really independent variables. They are only independent suppose i gave a particular condition
  • 46. Probability and Information Theory Suppose if I fix age at 30 people of age of 30 regardless of height will have vocabularies. They donot at least depend on height The above case is two variables are not independent but they are conditionally independent
  • 47. Probability and Information Theory Expectation: It gives mean/average/expected value of the random variable given the distribution Distribution of an event consists not only of the input of the values that can be observed but is made up of all possible values Ex: Expected returns on a certain investment in the market Expected rainfall during coming monsoon
  • 48. Probability and Information Theory The expectation or expected value of some function f(x) with respect to a probability distribution P(x) is the average of f(x) when x is drawn from p Denoted by Ex~p [f(x)] Ex~p [f(x)] = ∑ P(x) f(x) x
  • 49. Probability and Information Theory Multivariate Expectation: (multiple variables) For a multivariate random variable x (vector) we can interpret the variable by considering
  • 50. Probability and Information Theory Linearity of Expectation: If ‘f’ is a linear combination of two other functions g and h, α, β are scalars f(x) = α g(x) = βh(x) then Expectation of f is E[f] = α E[g] + β E[h]
  • 51. Probability and Information Theory Variance : It gives the variation from the expected value Variance also measures amount of fluctuation of the variable Ex: Variance is returns on certain investment in the market ( risk measure)
  • 53. Probability and Information Theory CoVariance : This is for a pair of variables x and y Measures the total variation of two random variables from their expected values
  • 54. Probability and Information Theory Useful properties of common functions Logistic sigmoid function Sigmoid function is mathematical function having a characteristic ‘S’ shaped curve Whatever input we give to sigmoid function it give the output 0 and 1. (0,1)
  • 55. Probability and Information Theory Soft plus function: The out put produced by sigmoid function have upper limits and lower limits where as softplus function produces output in (0, +∞ ) It is smooth approximation function can be used to constrain output of a machine to always positive
  • 56. Probability and Information Theory Soft plus function: The out put produced by sigmoid function have upper limits and lower limits where as softplus function produces output in (0, +∞ ) It is smooth approximation function can be used to constrain output of a machine to always positive
  • 57. Probability and Information Theory Bayes theorem: It helps in determining the probability of an event that is based on some event that has already occurred. If A and B are two events then the formula for bayes theorem is given by Where P(A/B) is the probability of condition when event A is occurring while event B has already occurred
  • 58. Probability and Information Theory From the definition of conditional probability Bayers theorem can be derived for event as
  • 59. Probability and Information Theory Information theory is a mathematical approach to the study of coding of information along with the quantification, storage, and communication of information. It provides the quantitative measure of information contained in message signal Digital information are always associated with uncertainty If probability of occurrence of an event is very high = information contained in that event will be less
  • 60. Probability and Information Theory Ex: Tomorrow, the sun will rise from the east If the probability of occurrence of an event is low information contained in the event will be more Ex: Solar eclipse will occur today Consider a discrete random variable X with possible outcomes xi i=1,2,…n. The information of event X=xi is defined as
  • 61. Probability and Information Theory Consider a source which tosses a fair coin produces an output equal to 1 if a head appears and 0 if a tail appears P(1) =P(0) =0.5 The information content of each output from the source is 1 bit
  • 62. Probability and Information Theory Entropy: it tells how much information present in an event We can measure the amount of uncertainty in an entire probability distribution using Shannon entropy H(x) is total amount of information in an entire probability distribution
  • 63. Probability and Information Theory kullback leibler divergence: It is a measure of how one probability distribution is different from second
  • 64. Probability and Information Theory Structured Probabilistic Models : It is representation of the factorization of probability distribution using graph A way of describing probability distributions using a graph to describe which variables interact with each other directly Directed: graphs with directed edges Undirected: graphs with undirected edges
  • 65. Probability and Information Theory Directed graphs: The probability distribution over x is given by
  • 66. Probability and Information Theory Undirected graphs: Any set of nodes that are connected to each other is called clique Each clique in an undirected model is associated with a factor of