Relationship between training accuracy and validation accuracy

During model training, I noticed various behaviour in between training and validation accuracy. I understand that ‘The training set is used to train the model, while the validation set is only used to evaluate the model’s performance…’, but I’d like to know if there is any relationship between training and validation accuracy and if yes,…

Does adding a constant to all rewards change the set of optimal policies in episodic tasks?

I’m taking a Coursera course on Reinforcement learning. There was a question there that wasn’t addressed in the learning material: Does adding a constant to all rewards change the set of optimal policies in episodic tasks? The answer is Yes – Adding a constant to the reward signal can make longer episodes more or less…

Why is stationary sate distribution independent of initial state in policy gradient theorem proof?

I was going through the proof of the policy gradient theorem here: https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#svpg In the section “Proof of Policy Gradient Theorem” in the block of equations just under the sentence “The nice rewriting above allows us to exclude the derivative of Q-value function…” they set $$ \eta (s) = \sum^\infty_{k=0} \rho^\pi(s_0 \rightarrow s, k) $$…