Unable to understand V* at infinite time horizon using Bellman equation for solving MDP

I’ve been following the Berkeley cs188’s assignment (I’m not taking the course). Currently, they don’t show the solution in the gradescope unless I get it correct. My reasoning was $V^*(a)$ = 10 fixed, because the optimal action is to terminate and receive the reward 10. $V^*(b) = 10 \times 0.2 = 2$ using Bellman optimality…

Unable to understand V* at infinite time horizon using Bellman equation for solving MDP

I’ve been following the Berkeley cs188’s assignment (I’m not taking the course). Currently, they don’t show the solution in the gradescope unless I get it correct. My reasoning was $V^*(a)$ = 10 fixed, because the optimal action is to terminate and receive the reward 10. $V^*(b) = 10 \times 0.2 = 2$ using Bellman optimality…

Unable to understand V* at infinite time horizon using Bellman equation for solving MDP

I’ve been following the Berkeley cs188’s assignment (I’m not taking the course). Currently, they don’t show the solution in the gradescope unless I get it correct. My reasoning was $V^*(a)$ = 10 fixed, because the optimal action is to terminate and receive the reward 10. $V^*(b) = 10 \times 0.2 = 2$ using Bellman optimality…

How does the BERT model (in Tensorflow or Paddle-paddle frameworks) relate to nodes of the underlying neural-net that’s being trained?

The BERT model in frameworks like TensorFlow/Paddle-paddle shows various kinds of computation nodes (like subtract, accumulate, add, mult etc) in a graph like form in 12 layers. But this graph doesn’t look anything like a neural-network, one that’s typically shown in textbooks (e.g. like this https://en.wikipedia.org/wiki/Artificial_neural_network#/media/File:Colored_neural_network.svg) where each edge has a weight that’s being trained…