### Optimal RL function approximation for TicTacToe game

I modeled the TicTacToe game as a RL problem – with an environment and an agent. At first I made an “Exact” agent – using the SARSA algorithm, I saved every unique state, and chose the best (available) action given that state. I made 2 agents learn by competing against each other. The agents learned…

