Backpropagation Process in Deep Neural Network
Backpropagation is one of the important concepts of a neural network. Our task is to classify our data best. For this, we have to update the weights of parameter and bias, but how can we do that in a deep neural network? In the linear regression model, we use gradient descent to optimize the parameter. Similarly here we also use gradient descent algorithm using Backpropagation.
For a single training example, Backpropagation algorithm calculates the gradient of the error function. Backpropagation can be written as a function of the neural network. Backpropagation algorithms are a set of methods used to efficiently train artificial neural networks following a gradient descent approach which exploits the chain rule.
The main features of Backpropagation are the iterative, recursive and efficient method through which it calculates the updated weight to improve the network until it is not able to perform the task for which it is being trained. Derivatives of the activation function to be known at network design time is required to Backpropagation.
Now, how error function is used in Backpropagation and how Backpropagation works? Let start with an example and do it mathematically to understand how exactly updates the weight using Backpropagation.
Input values
X1=0.05
X2=0.10
Initial weight
W1=0.15 Â Â w5=0.40
W2=0.20 Â Â w6=0.45
W3=0.25 Â Â w7=0.50
W4=0.30 Â Â w8=0.55
Bias Values
b1=0.35 Â Â b2=0.60
Target Values
T1=0.01
T2=0.99
Now, we first calculate the values of H1 and H2 by a forward pass.
Forward Pass
To find the value of H1 we first multiply the input value from the weights as
               H1=x1×w1+x2×w2+b1
            H1=0.05×0.15+0.10×0.20+0.35
                  H1=0.3775
To calculate the final result of H1, we performed the sigmoid function as
We will calculate the value of H2 in the same way as H1
               H2=x1×w3+x2×w4+b1
            H2=0.05×0.25+0.10×0.30+0.35
                  H2=0.3925
To calculate the final result of H1, we performed the sigmoid function as
Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2.
To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2 from the weights as
               y1=H1×w5+H2×w6+b2
            y1=0.593269992×0.40+0.596884378×0.45+0.60
                  y1=1.10590597
To calculate the final result of y1 we performed the sigmoid function as
We will calculate the value of y2 in the same way as y1
               y2=H1×w7+H2×w8+b2
            y2=0.593269992×0.50+0.596884378×0.55+0.60
                  y2=1.2249214
To calculate the final result of H1, we performed the sigmoid function as
Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target values T1 and T2.
Now, we will find the total error, which is simply the difference between the outputs from the target outputs. The total error is calculated as
So, the total error is
Now, we will backpropagate this error to update the weights using a backward pass.
Backward pass at the output layer
To update the weight, we calculate the error correspond to each weight with the help of a total error. The error on weight w is calculated by differentiating total error with respect to w.
We perform backward process so first consider the last weight w5 as
From equation two, it is clear that we cannot partially differentiate it with respect to w5 because there is no any w5. We split equation one into multiple terms so that we can easily differentiate it with respect to w5 as
Now, we calculate each term one by one to differentiate Etotal with respect to w5 as
Putting the value of e-y in equation (5)
So, we put the values of in equation no (3) to find the final result.
Now, we will calculate the updated weight w5new with the help of the following formula
In the same way, we calculate w6new,w7new, and w8new and this will give us the following values
            w5new=0.35891648
            w6new=408666186
            w7new=0.511301270
            w8new=0.561370121
Backward pass at Hidden layer
Now, we will backpropagate to our hidden layer and update the weight w1, w2, w3, and w4 as we have done with w5, w6, w7, and w8 weights.
We will calculate the error at w1 as
From equation (2), it is clear that we cannot partially differentiate it with respect to w1 because there is no any w1. We split equation (1) into multiple terms so that we can easily differentiate it with respect to w1 as
Now, we calculate each term one by one to differentiate Etotal with respect to w1 as
We again split this because there is no any H1final term in Etoatal as
will again split because in E1 and E2 there is no H1 term. Splitting is done as
We again Split both because there is no any y1 and y2 term in E1 and E2. We split it as
Now, we find the value of by putting values in equation (18) and (19) as
From equation (18)
From equation (8)
From equation (19)
Putting the value of e-y2 in equation (23)
From equation (21)
Now from equation (16) and (17)
Put the value of in equation (15) as
We havewe need to figure outas
Putting the value of e-H1 in equation (30)
We calculate the partial derivative of the total net input to H1 with respect to w1 the same as we did for the output neuron:
So, we put the values of in equation (13) to find the final result.
Now, we will calculate the updated weight w1new with the help of the following formula
In the same way, we calculate w2new,w3new, and w4 and this will give us the following values
            w1new=0.149780716
            w2new=0.19956143
            w3new=0.24975114
            w4new=0.29950229
We have updated all the weights. We found the error 0.298371109 on the network when we fed forward the 0.05 and 0.1 inputs. In the first round of Backpropagation, the total error is down to 0.291027924. After repeating this process 10,000, the total error is down to 0.0000351085. At this point, the outputs neurons generate 0.159121960 and 0.984065734 i.e., nearby our target value when we feed forward the 0.05 and 0.1.