### Where is the LSTM Component in a Artificial Neural Network?

Given the standard illustrative feed forward neural net model, with the dots as neurons and the lines as neuron-to-neuron connection, what part is the (unfold) LSTM cell (see picture)? IS it a neuron (a dot) or a layer?

### What are the state-of-the-art meta-reinforcement learning methods?

This question can seem a little bit too broad, but I am wondering what are the current state-of-the-art works on meta reinforcement learning. Can you provide me with the current state-of-the-art in this field?

### Weight initialization in neural networks

How do newer weight initialization techniques {He,Xavier,etc.} improve results over Zero or Random initialization of weights in a neural network? Is there any mathematical evidence behind this?

### Solution path of a search algorithm on a graph

I’m working on a problem where we are given a graph and asked to perform various search algorithms (bfs, dfs, ucs, A*, etc..) and the goal state is to visit all nodes in the graph. After all nodes are visited, we are to print out the “solution path.” I am a bit confused on what…

### Trained a regression network and getting EXACT same result on validation set, on every epoch

I trained this network from this github. The training went well, and returns nice results for new, unseen images. On training, the loss changed (decreased), thus I must assume the weights changed as well. On training, I saved a snapshot of the net every epoch. When trying to run a validation set through each epoch’s…

### Creating a noising model for NLP that models human noising

I’m trying to create a noising model that accurately reflects how people would noise name data. I was thinking of randomly switching out characters and creating a probability over which character gets switched in based on keyboard closeness and how similar anatomically another character looks to it. For example, “l” has a higher prob of…

### In $logp_{\theta}(x^1,…,x^N)=D_{KL}(q_{\theta}(z|x^i)||p_{\phi}(z|x^i))+\mathbb{L}(\phi,\theta;x^i)$ why is $\theta$ and param for $p$ and $q$?

In $logp_{\theta}(x^1,…,x^N)=D_{KL}(q_{\theta}(z|x^i)||p_{\phi}(z|x^i))+\mathbb{L}(\phi,\theta;x^i)$ why is $\theta$ and param for $p$ and $q$? Why does $p(x^1,…,x^N)$ and $q(z|x^i)$ have the same parameter $\theta?$ Cause $p$ is just the probability of the observed data and $q$ is the approximation of the posterior so shouldn’t they be different distributions and their parameters different?

### Strategy of using intermediate layers of a neural network as features?

There is a popular strategy of using a neural network trained on one task to produce features for another related task by “chopping off” the top of the network and sewing the bottom onto some other modeling pipeline. Word2Vec models employ this strategy, for example. Is there an industry-popular term for this strategy? Are there…

### Which is a better form of Regularization – Lasso (L1) or Ridge (L2)?

Given a Ridge and a Lasso Regularizer, which one should be chosen for better performance? An intuitive graphical explanation (intersection of the elliptical contours of the loss function with the region of constraints) would be helpful.

### Uniform Cost Search Algorithm from Russell&Norvig Artificial Intelligence Book

On page 84 of Russell&Norvig’s Artificial Intelligence Book, 3rd Ed., the algorithm for uniform cost search is given. I provided a screenshot of it here for your convenience. I am having trouble understanding the highlighted line if child.STATE is not in explored **or** frontier then Shouldn’t that be if child.STATE is not in explored **and**…