I’m reading an article on reinforcement learning, and I don’t understand why the agent’s policy $\pi$ is not part of definition of Markov Decision process(MDP):
Bu, Lucian, Robert Babu, and Bart De Schutter. “A comprehensive survey of multiagent reinforcement learning.” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38.2 (2008): 156-172.
My question is:
Why the policy is not a part of the MDP definition?