What is the best way to train a neural network without having a calculated q-table? sometimes we couldn’t find the right q-table that it solvable by neural network but there are some other answers that it solvable but it doesn’t exist in q-table or we don’t know the way to find that right q-table (we have large space for possible solutions). is it possible to train it directly from its selected actions and its cumulative rewards?
It can be possible by using gradient-free algorithms and a fitness objective function but it is computationally expensive and most of times couldn’t find an optimal solution.
How can we calculate neural network’s error gradients directly from its selected actions and its cumulative rewards?
Suppose a human that most of the time don’t calculate right answers in advance, he just try and learns from his behavior, is there any way for a neural network to learn from its behavior too without calculating right actions in advance?