The partial derivative of the binary
Cross-entropy loss function
In order to find the partial derivative of the cost function J with respect to a
particular weight wj, we apply the chain rule as follows:
∂J
∂wj
= −
1
N
N
i=1
∂J
∂pi
∂pi
∂zi
∂zi
∂wj
with
J = −
1
N
N
i=1
yi ln (pi) + (1 − yi) ln (1 − pi)
and
pi =
1
1 + e−zi
and
zi = XiwT
+ b
with X =




x1,1 x1,2 . . . x1,n
x2,1 x2,2 . . . x2,n
...
... ... ...
xN,1 xN,2 . . . xN,n



, where n is representing the number of
independent variables and N the number of samples,
the weight vector w = w0 . . . wn−1 and a scalar b representing the bias term.
Note that pi is a sigmoid function. The derivative of the sigmoid function is
given by1
: ∂σ(x)
∂x = σ(x)(1 − σ(x))
And since the derivative of the natural logarithm is2
: ∂ ln (x)
∂x = 1
x we can begin to
solve the equation above:
∂J
∂wj
= −
1
N
N
i=1
∂J
∂pi
∂pi
∂zi
∂zi
∂wj
=
1
= −
1
N
N
i=1
[
yi
pi
+
1 − yi
1 − pi
(−1)] [pi(1 − pi)] xj =
= −
1
N
N
i=1
[yi(1 − pi) − (1 − yi)pi] xj =
= −
1
N
N
i=1
(yi − pi) xj =
=
1
N
N
i=1
(pi − yi) xj
The partial derivative of the cost function J with respect to the bias b can
be calculated accordingly. Considering that the mathematical deriviation of the
formula is very similar, except that ∂zi
∂b = 1, we can simply write:
∂J
∂b
=
1
N
N
i=1
(pi − yi)
References:
1) http://www.ai.mit.edu/courses/6.892/lecture8-html/sld015.htm
2) https://www.onlinemathlearning.com/derivative-ln.html
November 10, 2020 T. Roeschl
2

The partial derivative of the binary Cross-entropy loss function

  • 1.
    The partial derivativeof the binary Cross-entropy loss function In order to find the partial derivative of the cost function J with respect to a particular weight wj, we apply the chain rule as follows: ∂J ∂wj = − 1 N N i=1 ∂J ∂pi ∂pi ∂zi ∂zi ∂wj with J = − 1 N N i=1 yi ln (pi) + (1 − yi) ln (1 − pi) and pi = 1 1 + e−zi and zi = XiwT + b with X =     x1,1 x1,2 . . . x1,n x2,1 x2,2 . . . x2,n ... ... ... ... xN,1 xN,2 . . . xN,n    , where n is representing the number of independent variables and N the number of samples, the weight vector w = w0 . . . wn−1 and a scalar b representing the bias term. Note that pi is a sigmoid function. The derivative of the sigmoid function is given by1 : ∂σ(x) ∂x = σ(x)(1 − σ(x)) And since the derivative of the natural logarithm is2 : ∂ ln (x) ∂x = 1 x we can begin to solve the equation above: ∂J ∂wj = − 1 N N i=1 ∂J ∂pi ∂pi ∂zi ∂zi ∂wj = 1
  • 2.
    = − 1 N N i=1 [ yi pi + 1 −yi 1 − pi (−1)] [pi(1 − pi)] xj = = − 1 N N i=1 [yi(1 − pi) − (1 − yi)pi] xj = = − 1 N N i=1 (yi − pi) xj = = 1 N N i=1 (pi − yi) xj The partial derivative of the cost function J with respect to the bias b can be calculated accordingly. Considering that the mathematical deriviation of the formula is very similar, except that ∂zi ∂b = 1, we can simply write: ∂J ∂b = 1 N N i=1 (pi − yi) References: 1) http://www.ai.mit.edu/courses/6.892/lecture8-html/sld015.htm 2) https://www.onlinemathlearning.com/derivative-ln.html November 10, 2020 T. Roeschl 2