While looking at the mathematics of the Back Propagation Algorithm for a Multi-Layer Perceptron, I noticed that in order to find the partial derivative of the cost function with respect to a weight (say $w$) from any of the hidden layers, we’re just writing the error function from the final outputs in terms of the inputs and hidden layer weights and then cancelling all the terms without $w$ in it as differentiating those terms with respect to $w$ would give zero.
Where is the Back Propagation of error while doing this? This way, I can find the partial derivatives of the first hidden layer first and the go towards the other ones if I wanted to. Is there some other method of going about it so that the Back Propagation concept comes into play? Also, I’m looking for a general method/algorithm, not just for 1-2 hidden layers.
I’m fairly new to this and I’m just following what’s being taught in class. Nothing I found on the internet seems to have proper notation so I can’t understand what they’re saying.