As part of GRU training, I want to retrieve the hidden state tensors.

I have defined a GRU with two layers:

```
self.lstm = nn.GRU(params.vid_embedding_dim, params.hidden_dim , 2)
```

The forward function is defined as follows (the following is just a part of the implementation):

```
def forward(self, s, order, batch_size, where, anchor_is_phrase = False):
"""
Forward prop.
"""
# s is of shape [128 , 1 , 300] , 128 is batch size
output, (a,b) = self.lstm(s.cuda())
output.data.contiguous()
```

And out is of shape: [128 , 400] (128 is the number of samples which each one is embedded in 400 dimensional vector).

I understand that `out`

is the output of the last hidden state and thus I expect it to be equal to `b`

. However, after I checked the values I saw that it’s indeed equal but `b`

contains the tensor in a different order, that is for example `output[0]`

is `b[49]`

. Am I missing something here ?

Thanks.