Truncated Neural Networks?

Recently, I’ve found good success in truncated neural networks ie functions of the form $$g=f1_{[-M,M]^d},$$ where $f:\mathbb{R}^d\to\mathbb{R}^n$ is a feed-forward neural network and $1_{[-M,M]^d}$ is the indicator function on the cube of radius $M>0$. Has anyone come across any paper using these “truncated neural networks” instead of simply using (un-truncated/classical) feed-forward neural networks?

Difference between batches in deep q learning and supervised learning (e.g. classification)

I wonder what or how the batch loss is calculated in both DQNs and simple classifiers. From what I understood in a classifier a common method is that you sample a mini-batch, calculate the loss for every example, calculate the average loss over the whole batch and adjust the weights w.r.t to average loss? (Please…

When training a CNN, what are the hyperparameters to tune first?

I am training a convolutional neural network for object detection. Apart from the learning rate, what are the other hyperparameters that I should tune? and in what order of importance. Besides, I read that doing a grid search for hyperparameters is not the best way to go about training, and that random search is better…

Different formula of Cross-Entropy in Pytorch

In my understanding, to calculate Cross-Entropy is using this formula: $$H(p,q) = – \sum p_i \log(q_i)$$ But in Pytorch nn.CrossEntropyLoss is calculated using this formula: $$loss = -\log\left( \frac{\exp(x[class])}{\sum_j \exp(x_j)} \right)$$ that I think it only addresses the $\log(q_i)$ part in the first formula. So is that means Pytorch using different…

Intelligent crossover for binary chromosomes

I’m studying about genetic algorithm. I’m studying about different crossover operations used for binary chromosomes. These methods usually don’t use any intelligence (1-point crossover, uniform crossover, etc.). I found methods like Fitness-based Crossover and Boltzmann Crossover, which use fitness value so that the child will be created from better parents with a better probability. So…

What’s the function that SGD takes to calculate the gradient? (deep learning)

Since this is my first post on this forum, this needs to be a noobie question, and I’m sorry about that :,) I’m struggling to fully understand the stochastic gradient descent algorithm. I know that gradient descent allows you to find the local minimum of a function. What I don’t know, is what exactly that…

Training dataset for convolutional neural network classification – will images captured on the ground be useful for training aerial imagery?

I am an agronomy graduate student looking to classify crops from weeds using convolutional neural networks (CNNs). The basic idea that I am wanting to get into involves separating crops from weeds from aerial imagery (either captured by drones or piloted aircraft). The idea of the project that I am proposing involves spending some time…

word2vec implementation in Tensorflow 2.0

I want to implement word2vec using tensorflow 2.0 I have prepared dataset according to the skip-gramm model and I have got approx. 18 million observations(target and context words). I have used the followng dataset for my goal: https://www.kaggle.com/c/quora-question-pairs/notebooks I have created a new dataset for n-gramm model. I have used windows_size 2 and number of…

How can a DQN backpropagate its loss?

Gday guys, I’m currently trying to take the next step in deep learning. I managed so far to write my own basic feed-forward network in python without any frameworks (just numpy and pandas) so I think I understood the math and intuition behind backpropagation. Now I’m stuck with deep q-learning. I’ve tried to get an…

How can a DQN backpropagate its loss?

Gday guys, I’m currently trying to take the next step in deep learning. I managed so far to write my own basic feed-forward network in python without any frameworks (just numpy and pandas) so I think I understood the math and intuition behind backpropagation. Now I’m stuck with deep q-learning. I’ve tried to get an…