### Position of layer normalization in transformer model

In Attention Is All You Need paper: That is, the output of each sub-layer is $LayerNorm(x+Sublayer(x))$, where $Sublayer(x)$ is the function implemented by the sub-layer itself. We apply dropout to the output of each sub-layer, before it is added to the sub-layer input and normalized. which makes the final formula $LayerNorm(x+Dropout(Sublayer(x)))$. However, in https://github.com/tensorflow/models/blob/0effd158ae1e6403c6048410f79b779bdf344d7d/official/transformer/model/transformer.py#L278-L288, I…

### Where does entropy enter in Soft Actor-Critic(SAC)?

I am currently trying to understand SAC(Soft Actor-Critic), and I am thinking of it as a basic actor-critic with the entropy included. However, I expected the entropy to appear in the Q-function. From SpinningUp-SAC it looks like the entropy is entering through the value-function, so I’m thinking it enters by the log pi_phi(a_t|s_t) in the…

### How many training data required for GAN?

I’m beginning to study and implement GAN to generate more dataset. I’ll just try to experiment with state-of-the-art GAN models as described in here https://paperswithcode.com/sota/image-generation-on-cifar-10. The problem is I don’t have big dataset (around 1.000) for image classification, I have tried to train and test my dataset with GoogleNet and InceptionV3 and the results are…

### What is probability distribution in machine learning?

If we were learning or working in machine learning field then we frequently come across this term probability distribution. I know what probability, conditional probability and probability distribution/density in math means but what is its meaning in machine learning? Take this example where x is an element of D, D begin a dataset, \$x \in…

### Keras correlation coefficient as network metric in R

does anyone know how to use the correlation coefficient or squared correlation coefficient as a metric in keras in R (although other languages may provide clues). This is for a CNN that functions similar to an image compressor/decompressor. Thank you!

### Difference in the code structure of RNN and CNN

I understand that in general RNN is good for time series data and CNN image data, and have noticed many blogs explaining the fundamental differences in the models. As a beginner in Machine Learning and coding, I would like to know from the code perspective, what the differences in between RNN and CNN are in…

### Deep Learning on time series tabular data

In this new book release, at the top of page 51 the authors mention that to do deep learning on time series tabular data the developer should structure the tensors such that the channels represent the time periods. For example, with a dataset of 17 features where each row represents an hour of a day:…

### Is it possible to create an AI/ML model to hack into any system? Or is there one?

Is it possible to create an AI/ML model to hack into any system? Or is there one? If there is one, how it can be prevented?

### Non-sequential Deep Learning models able to outperform sequential models

Can CNN and other non-sequential Deep Learning models outperform LSTM (or other sequential models) in time series data? I know this question is not very specific, but I experienced this when predicting daily store sales and I am curious as to why it can happen Thanks