How can policy gradient fulfill the required action range?

As we often use the gaussian policy for continuous action space, but how can we make it proper for the range. In spiningup’s implementation, I found they just simply used, def mlp_gaussian_policy(x, a, hidden_sizes, activation, output_activation, action_space): act_dim = a.shape.as_list()[-1] mu = mlp(x, list(hidden_sizes)+[act_dim], activation, output_activation) log_std = tf.get_variable(name=’log_std’, initializer=-0.5*np.ones(act_dim, dtype=np.float32)) std = tf.exp(log_std) pi…

Why can’t LSTMs keep track of the “important parts” of a sequence?

I keep reading about how LSTMs can’t remember the “important parts” of a sequence which is why attention based mechanisms are required. I was trying to use LSTMs to find people name format. For example, “Millie Bobby Brown” can be seen as first_name middle_name last_name format, which I’ll denote as 0, but then there’s “Brown,…