Position of layer normalization in transformer model

In Attention Is All You Need paper: That is, the output of each sub-layer is $LayerNorm(x+Sublayer(x))$, where $Sublayer(x)$ is the function implemented by the sub-layer itself. We apply dropout to the output of each sub-layer, before it is added to the sub-layer input and normalized. which makes the final formula $LayerNorm(x+Dropout(Sublayer(x)))$. However, in https://github.com/tensorflow/models/blob/0effd158ae1e6403c6048410f79b779bdf344d7d/official/transformer/model/transformer.py#L278-L288, I…

How many training data required for GAN?

I’m beginning to study and implement GAN to generate more dataset. I’ll just try to experiment with state-of-the-art GAN models as described in here https://paperswithcode.com/sota/image-generation-on-cifar-10. The problem is I don’t have big dataset (around 1.000) for image classification, I have tried to train and test my dataset with GoogleNet and InceptionV3 and the results are…