Categories
Artificial Intelligence (AI) Mastering Development

Can this be a possible deep q learning pseudocode?

I am not using replay here. s – state a – action r – reward n_s – next state q_net – neural network representing q step() { get s,a,r,n_s q_target[s,a]=r+gamma*max(q_net[n_s,:]) loss=mse(q_target,q_net[s,a]) loss.backprop() } while(!terminal) { totalReturn+=step(); }