### Getting started with creating a general AI based on textual and then image based data?

I have a pool of knowledge that I want to mine for information and allow an AI to deduce likely conclusions from this information. My goal is to give the AI a set of textual data that is rated on a scale of 0 to 100 ranging from false (0) to unequivocally true (100). Based…

### In deep learning, is it possible to use discontinuous activation functions?

In deep learning, is it possible to use discontinuous activation functions (e.g. one with jump discontinuity)? (My guess : for example, ReLU is non-differentiable at a single point, but it still has well-defined derivative. If an activation function has a jump discontinuity, then its derivative is supposed to have a delta function at that point.…

### Deep Q-learning (DQN) conceptual questions

I am trying to understand the how Deep – Q learning (DQN) works. To my current understanding, each $Q(s,a)$ functions is estimated to be a function of a feature vector of it’s state $\phi$(s) and the weight of the network $\theta$. The loss function to minimise is $||\delta_{t+1}||^2$ where $\delta_{t+1}$ is shown below. Intuitively, I…

### Deep Q-learning (DQN) conceptual questions

I am trying to understand the how Deep – Q learning (DQN) works. To my current understanding, each $Q(s,a)$ functions is estimated to be a function of a feature vector of it’s state $\phi$(s) and the weight of the network $\theta$. The loss function to minimise is $||\delta_{t+1}||^2$ where $\delta_{t+1}$ is shown below. Intuitively, I…

### Deep Q-learning (DQN) conceptual questions

I am trying to understand the how Deep – Q learning (DQN) works. To my current understanding, each $Q(s,a)$ functions is estimated to be a function of a feature vector of it’s state $\phi$(s) and the weight of the network $\theta$. The loss function to minimise is $||\delta_{t+1}||^2$ where $\delta_{t+1}$ is shown below. Intuitively, I…

### What class of problem is this?

If I have a lot of input output pairs as training data <float Xi, float Yi> and I have a parametrized function (I know the function algorithm, but not the values of the many many parameters it contains). The function takes two input values: // c is a precomputed classifier for x and can have…

### What is the intuition behind TD($\lambda$)?

I’d like to better understand temporal-difference learning. In particular, I’m wondering if it is prudent to think about TD($\lambda$) as a type of “truncated” Monte Carlo learning?

### Why is Multi-agent Deep Deterministic Policy Gradient (MADDPG) running slowly and taking only 22% from the GPU?

I already asked this question on StackOverflow Where I need to run the Distributed Multi-Agent Cooperation Algorithm based on MADDPG with prioritized batch data code with increasing the number of agents to be 12 agents but it takes a lot of times to train 3500 episodes. I have tried different setting but nothing is working.…

### Epsilon greedy vs Softmax Policy

Could someone explain to me which is the key difference between the epsilon greedy policy and the softmax policy? In particular in the contest of SARSA and Q-Learning algorithms. I understood the main difference between these two algorithms, but I didn’t understand all the combinations between algorithm and policy SARSA + Epsilon SARSA + Softmax…

### off-policy evaluation in reinforcement learning

IPS estimator, which is used for off-policy evaluation in a contextual bandit problem, is well explained here: Doubly Robust Policy Evaluation andOptimization https://arxiv.org/pdf/1503.02834.pdf The old policy $\mu$, or the behavior policy, is okay to be non-stationary in the IPS estimator even if the new policy $\nu$, or the target policy, should be stationary. I wonder…