I’m currently trying to take the next step in deep learning. I managed so far to write my own basic feed-forward network in python without any frameworks (just numpy and pandas) so I think I understood the math and intuition behind backpropagation. Now I’m stuck with deep q-learning. I’ve tried to get an agent to learn in various environments. But somehow nothing works out. So there has to be something I’m getting wrong. And it seems that I do not understand the critical part right at least that’s what I’m thinking.
The screenshot is from this video.
What I’m trying to draw here is my understanding of the very basic process of a simple DQN. Assuming this is right: How is the loss backpropagated?
Since only the selected Q(s, a) values (5 and 7) are further processed in the loss function how is the impact from the other neurons calculated so their weights can be adjusted to better predict the real q-values?