Artificial Intelligence (AI) Mastering Development

Having trouble understanding how Double deep Q networks work

I’ve looked at various articles and I’m still very confused, I understand the normal double Q learning about having two Action value estimates that use two different set of samples

But coming to neural networks I’m confused
The normal DQN algorithm uses our target network for both action selection and evaluation when performing updates

I was told that they initialize them to small weights to prevent overestimation, if that’s the case why exactly are we now using the online network to select actions

I mean our target network can handle that since it’ll never overestimate values

Can someone shed more light on this?

Thank you in advance!

Leave a Reply

Your email address will not be published. Required fields are marked *