SlideShare a Scribd company logo
1 of 3
Download to read offline
Softmax
Dr. Sibt ul Hussain
Saif Ali Kheraj
August 18, 2019
Let us talk about Softmax. We will be using CIFAR dataset to illustrate softmax and
its derivative.


w11 w12 ....w13073
w21 w22 ....w23073
w31 w32 ....w33073

 .






x11
x12
x13
. . .
x13073






We will use single example to illustrate and you can extend this concept to multiple
examples
Weights: 3 x 3073
X: 3073 x 1
After taking dot product you will get matrix of dimensions 3x1 which represents un-
normalised score if we talk in terms of softmax.
f =


a
b
c


Note: You will first have to perform clipping to avoid numerical instability because if
you take exponent of a number x it becomes large enough making it numerically unstable.
To normalise it we take anti log which is exponential.


exp(a)
exp(b)
exp(c)

 / exp(f)
This gives us normalised score. Score is interpreted as log probability for each class
and softmax wants probability of correct class to be high. Higher score of correct class
will reduce loss . You can check it by plotting log function, Well Log is monotonically
1
increasing function thus we want to maximize the log likelihood or the other way if we take
negative of log which then becomes minimization problem. We can do it both way. To
maximize likelihood, we use gradient ascent and for minimization problem we use gradient
descent.
We will use minimization approach for loss so our Loss function would be:
Li = −log(P(yi)) (1)
Note: You can also derive this formula.
Let us expand this
Li = −log((exp(fyi)/
j
exp(fyj))
Li = −fyi + log(
j
exp(fyj))
(2)
Note: We will now calculate derivate with respect to all Ws. Loss contains 2 parts
one is data loss and the other is regularisation loss. Regularisation prevents overfitting
and defuses weights. It is good for generalisation. You can add regularization to the loss
function (L1 or L2) . Here we will use simple data loss and calculate its derivative.
We will now use above example to calculate derivative so we have 3x1 score Vector.
Let us say class 1 is the correct score. We will plugin to the above Loss formula.
Loss = −(w11x11 + w12x12 + ..... + w13073x13073) + log(exp(w11x11 + w12x12 + ..... + w13073x13073)+
exp(w21x11 + w22x12 + .......w23073x13073) + exp(w31x11 + w32x12 + ......w33073x13073))
(3)
This is now in expanded form which now becomes easy to calculate derivative with
respect to all Ws.
Since class 1 is the correct class, we will first calculate derivate of w1
∂Li
∂w11
= −x11 + (exp(f1)/
j
exp(fyj).x11
∂Li
∂w12
= −x12 + (exp(f1)/
j
exp(fyj).x12
.
.
.
.
∂Li
∂w13073
= −x13073 + (exp(f1)/
j
exp(fyj).x13073
(4)
2
Let us now calculate derivative with respect to w2
∂Li
∂w21
= (exp(f2)/
j
exp(fyj).x11
∂Li
∂w22
= (exp(f2)/
j
exp(fyj).x12
.
.
.
.
∂Li
∂w23073
= (exp(f2)/
j
exp(fyj).x13073
(5)
Note: You can now calculate derivative with respect to w3 easily. This is for single
example x11, x12......x13073. You can use and extend this concept to Multiple examples.
You
3

More Related Content

What's hot

Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1TOMMYLINK1
 
Mid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer GraphicsMid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer GraphicsDrishti Bhalla
 
Chapter 3 Output Primitives
Chapter 3 Output PrimitivesChapter 3 Output Primitives
Chapter 3 Output PrimitivesPrathimaBaliga
 
Lines and curves algorithms
Lines and curves algorithmsLines and curves algorithms
Lines and curves algorithmsMohammad Sadiq
 
Computer graphics presentation
Computer graphics presentationComputer graphics presentation
Computer graphics presentationLOKENDRA PRAJAPATI
 
Constraint propagation
Constraint propagationConstraint propagation
Constraint propagationPeet Denny
 
Numerical Method Analysis: Algebraic and Transcendental Equations (Linear)
Numerical Method Analysis: Algebraic and Transcendental Equations (Linear)Numerical Method Analysis: Algebraic and Transcendental Equations (Linear)
Numerical Method Analysis: Algebraic and Transcendental Equations (Linear)Minhas Kamal
 
Dda line algorithm presentatiion
Dda line algorithm presentatiionDda line algorithm presentatiion
Dda line algorithm presentatiionMuhammadHamza401
 
Bresenham's line drawing algorithm
Bresenham's line drawing algorithmBresenham's line drawing algorithm
Bresenham's line drawing algorithmnehrurevathy
 
Bresenham circles and polygons derication
Bresenham circles and polygons dericationBresenham circles and polygons derication
Bresenham circles and polygons dericationKumar
 
E:\1 Algebra\Unit 5 Polynomials\Quizzes&Tests\Polynomial Practice Tes...
E:\1   Algebra\Unit 5   Polynomials\Quizzes&Tests\Polynomial Practice Tes...E:\1   Algebra\Unit 5   Polynomials\Quizzes&Tests\Polynomial Practice Tes...
E:\1 Algebra\Unit 5 Polynomials\Quizzes&Tests\Polynomial Practice Tes...guestc27ed0c
 

What's hot (20)

Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1
 
Mid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer GraphicsMid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer Graphics
 
Unit 3
Unit 3Unit 3
Unit 3
 
Chapter 3 Output Primitives
Chapter 3 Output PrimitivesChapter 3 Output Primitives
Chapter 3 Output Primitives
 
Dda algorithm
Dda algorithmDda algorithm
Dda algorithm
 
Lines and curves algorithms
Lines and curves algorithmsLines and curves algorithms
Lines and curves algorithms
 
Line circle draw
Line circle drawLine circle draw
Line circle draw
 
3 chap
3 chap3 chap
3 chap
 
Computer Graphics
Computer GraphicsComputer Graphics
Computer Graphics
 
Computer graphics presentation
Computer graphics presentationComputer graphics presentation
Computer graphics presentation
 
Constraint propagation
Constraint propagationConstraint propagation
Constraint propagation
 
DDA algorithm
DDA algorithmDDA algorithm
DDA algorithm
 
Numerical Method Analysis: Algebraic and Transcendental Equations (Linear)
Numerical Method Analysis: Algebraic and Transcendental Equations (Linear)Numerical Method Analysis: Algebraic and Transcendental Equations (Linear)
Numerical Method Analysis: Algebraic and Transcendental Equations (Linear)
 
Assignment4
Assignment4Assignment4
Assignment4
 
Dda line algorithm presentatiion
Dda line algorithm presentatiionDda line algorithm presentatiion
Dda line algorithm presentatiion
 
Bresenham's line drawing algorithm
Bresenham's line drawing algorithmBresenham's line drawing algorithm
Bresenham's line drawing algorithm
 
Bresenham circles and polygons derication
Bresenham circles and polygons dericationBresenham circles and polygons derication
Bresenham circles and polygons derication
 
Introduction to curve
Introduction to curveIntroduction to curve
Introduction to curve
 
E:\1 Algebra\Unit 5 Polynomials\Quizzes&Tests\Polynomial Practice Tes...
E:\1   Algebra\Unit 5   Polynomials\Quizzes&Tests\Polynomial Practice Tes...E:\1   Algebra\Unit 5   Polynomials\Quizzes&Tests\Polynomial Practice Tes...
E:\1 Algebra\Unit 5 Polynomials\Quizzes&Tests\Polynomial Practice Tes...
 
Bresenham circle
Bresenham circleBresenham circle
Bresenham circle
 

Similar to Softmax

MLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackMLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackarogozhnikov
 
Machine Learning
Machine LearningMachine Learning
Machine LearningAshwin P N
 
Introduction to functions
Introduction to functionsIntroduction to functions
Introduction to functionsElkin Guillen
 
Introduction to TreeNet (2004)
Introduction to TreeNet (2004)Introduction to TreeNet (2004)
Introduction to TreeNet (2004)Salford Systems
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksESCOM
 
Techniques of Integration ppt.ppt
Techniques of Integration ppt.pptTechniques of Integration ppt.ppt
Techniques of Integration ppt.pptJaysonFabela1
 
Average value by integral method
Average value by integral methodAverage value by integral method
Average value by integral methodArun Umrao
 
Calculus - Functions Review
Calculus - Functions ReviewCalculus - Functions Review
Calculus - Functions Reviewhassaanciit
 
13 1 basics_integration
13 1 basics_integration13 1 basics_integration
13 1 basics_integrationManarAdham
 
Introduction to Functions
Introduction to FunctionsIntroduction to Functions
Introduction to FunctionsMelanie Loslo
 
Basic Integral Calculus
Basic Integral CalculusBasic Integral Calculus
Basic Integral CalculusArun Umrao
 
Principle of Integration - Basic Introduction - by Arun Umrao
Principle of Integration - Basic Introduction - by Arun UmraoPrinciple of Integration - Basic Introduction - by Arun Umrao
Principle of Integration - Basic Introduction - by Arun Umraossuserd6b1fd
 
Backpropagation - A peek into the Mathematics of optimization.pdf
Backpropagation - A peek into the Mathematics of optimization.pdfBackpropagation - A peek into the Mathematics of optimization.pdf
Backpropagation - A peek into the Mathematics of optimization.pdfOussama Haj Salem
 
Generating functions (albert r. meyer)
Generating functions (albert r. meyer)Generating functions (albert r. meyer)
Generating functions (albert r. meyer)Ilir Destani
 
Introduction to Machine Learning Lectures
Introduction to Machine Learning LecturesIntroduction to Machine Learning Lectures
Introduction to Machine Learning Lecturesssuserfece35
 

Similar to Softmax (20)

MLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackMLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic track
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Note introductions of functions
Note introductions of functionsNote introductions of functions
Note introductions of functions
 
Introduction to functions
Introduction to functionsIntroduction to functions
Introduction to functions
 
Introduction to TreeNet (2004)
Introduction to TreeNet (2004)Introduction to TreeNet (2004)
Introduction to TreeNet (2004)
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networks
 
Techniques of Integration ppt.ppt
Techniques of Integration ppt.pptTechniques of Integration ppt.ppt
Techniques of Integration ppt.ppt
 
Average value by integral method
Average value by integral methodAverage value by integral method
Average value by integral method
 
Calculus - Functions Review
Calculus - Functions ReviewCalculus - Functions Review
Calculus - Functions Review
 
13 1 basics_integration
13 1 basics_integration13 1 basics_integration
13 1 basics_integration
 
Introduction to Functions
Introduction to FunctionsIntroduction to Functions
Introduction to Functions
 
Basic Integral Calculus
Basic Integral CalculusBasic Integral Calculus
Basic Integral Calculus
 
Principle of Integration - Basic Introduction - by Arun Umrao
Principle of Integration - Basic Introduction - by Arun UmraoPrinciple of Integration - Basic Introduction - by Arun Umrao
Principle of Integration - Basic Introduction - by Arun Umrao
 
Backpropagation - A peek into the Mathematics of optimization.pdf
Backpropagation - A peek into the Mathematics of optimization.pdfBackpropagation - A peek into the Mathematics of optimization.pdf
Backpropagation - A peek into the Mathematics of optimization.pdf
 
Input analysis
Input analysisInput analysis
Input analysis
 
Generating functions (albert r. meyer)
Generating functions (albert r. meyer)Generating functions (albert r. meyer)
Generating functions (albert r. meyer)
 
Introduction to Machine Learning Lectures
Introduction to Machine Learning LecturesIntroduction to Machine Learning Lectures
Introduction to Machine Learning Lectures
 
Glm
GlmGlm
Glm
 
Section 7.8
Section 7.8Section 7.8
Section 7.8
 

Recently uploaded

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 

Recently uploaded (20)

ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 

Softmax

  • 1. Softmax Dr. Sibt ul Hussain Saif Ali Kheraj August 18, 2019 Let us talk about Softmax. We will be using CIFAR dataset to illustrate softmax and its derivative.   w11 w12 ....w13073 w21 w22 ....w23073 w31 w32 ....w33073   .       x11 x12 x13 . . . x13073       We will use single example to illustrate and you can extend this concept to multiple examples Weights: 3 x 3073 X: 3073 x 1 After taking dot product you will get matrix of dimensions 3x1 which represents un- normalised score if we talk in terms of softmax. f =   a b c   Note: You will first have to perform clipping to avoid numerical instability because if you take exponent of a number x it becomes large enough making it numerically unstable. To normalise it we take anti log which is exponential.   exp(a) exp(b) exp(c)   / exp(f) This gives us normalised score. Score is interpreted as log probability for each class and softmax wants probability of correct class to be high. Higher score of correct class will reduce loss . You can check it by plotting log function, Well Log is monotonically 1
  • 2. increasing function thus we want to maximize the log likelihood or the other way if we take negative of log which then becomes minimization problem. We can do it both way. To maximize likelihood, we use gradient ascent and for minimization problem we use gradient descent. We will use minimization approach for loss so our Loss function would be: Li = −log(P(yi)) (1) Note: You can also derive this formula. Let us expand this Li = −log((exp(fyi)/ j exp(fyj)) Li = −fyi + log( j exp(fyj)) (2) Note: We will now calculate derivate with respect to all Ws. Loss contains 2 parts one is data loss and the other is regularisation loss. Regularisation prevents overfitting and defuses weights. It is good for generalisation. You can add regularization to the loss function (L1 or L2) . Here we will use simple data loss and calculate its derivative. We will now use above example to calculate derivative so we have 3x1 score Vector. Let us say class 1 is the correct score. We will plugin to the above Loss formula. Loss = −(w11x11 + w12x12 + ..... + w13073x13073) + log(exp(w11x11 + w12x12 + ..... + w13073x13073)+ exp(w21x11 + w22x12 + .......w23073x13073) + exp(w31x11 + w32x12 + ......w33073x13073)) (3) This is now in expanded form which now becomes easy to calculate derivative with respect to all Ws. Since class 1 is the correct class, we will first calculate derivate of w1 ∂Li ∂w11 = −x11 + (exp(f1)/ j exp(fyj).x11 ∂Li ∂w12 = −x12 + (exp(f1)/ j exp(fyj).x12 . . . . ∂Li ∂w13073 = −x13073 + (exp(f1)/ j exp(fyj).x13073 (4) 2
  • 3. Let us now calculate derivative with respect to w2 ∂Li ∂w21 = (exp(f2)/ j exp(fyj).x11 ∂Li ∂w22 = (exp(f2)/ j exp(fyj).x12 . . . . ∂Li ∂w23073 = (exp(f2)/ j exp(fyj).x13073 (5) Note: You can now calculate derivative with respect to w3 easily. This is for single example x11, x12......x13073. You can use and extend this concept to Multiple examples. You 3