the role of probability in supervised learning

In the literature and textbooks, one often sees supervised learning expressed as a conditional probability, e.g., $\ \ \ \ \ \rho(\vec{y}|\vec{x},\vec{\theta})$ where $\vec{\theta}$ denotes a learned set of network parameters, $\vec{x}$ is an arbitrary input, and $\vec{y}$ is an arbitrary output. If we assume we have already learned $\vec{\theta}$, then, in words, $\rho(\vec{y}|\vec{x},\vec{\theta})$ is…

Details