### How can policy gradient fulfill the required action range?

As we often use the gaussian policy for continuous action space, but how can we make it proper for the range. In spiningup’s implementation, I found they just simply used, def mlp_gaussian_policy(x, a, hidden_sizes, activation, output_activation, action_space): act_dim = a.shape.as_list()[-1] mu = mlp(x, list(hidden_sizes)+[act_dim], activation, output_activation) log_std = tf.get_variable(name=’log_std’, initializer=-0.5*np.ones(act_dim, dtype=np.float32)) std = tf.exp(log_std) pi…

Details