Given a discrete, finite Markov Decision Process (MDP) with its usual parameters $(S, A, T, R, \gamma)$, it is possible to obtain the optimal policy $\pi^{*}$ and the optimal value function $V^{*}$ through one of many planning methods (policy iteration, value iteration or solving a linear program). I am interested in obtaining a random near-optimal […]

- Tags "\t", \quad \epsilon > 0$$ and if $\pi$ is the policy that is greedy with respect to the value function $V$, a, finite Markov Decision Process (MDP) with its usual parameters $(S, gamma, Given a discrete, hence the requirement for a lower bound. Any inputs regarding an efficient solution or reasons for the lack thereof are welcome., I would also like that the policy $\pi$ not be "too good", if $$||V - V^{*}||_{\infty} < \epsilon, it is possible to obtain the optimal policy $\pi^{*}$ and the optimal value function $V^{*}$ through one of many planning methods (policy ite, r, such that $$ \epsilon_1 < ||V^{*} - V^{\pi}||_{\infty} < \epsilon_2$$ I wish to know an efficient way of achieving this goal. A possible app, that is, the idea that near optimal value functions induce near optimal policies could be used, then $$ ||V^{\pi} - V^{*}||_{\infty} < \frac{2\gamma\epsilon}{1 - \gamma}$$ So by picking a suitable $\epsilon$ for the given $\gamma$, value iteration or solving a linear program). I am interested in obtaining a random near-optimal policy $\pi$, we can be sure of any upper bound $\epsilon_2$. However, we can show that, with the value function associated with the policy given by $V^{\pi}$