### Why is a softmax used rather than dividing each activation by the sum?

Just wondering why a softmax is typically used in practice on outputs of most neural nets rather than just summing the activations and dividing each activation by the sum. I know it’s roughly the same thing but what is the mathematical reasoning behind a softmax over just a normal summation? Is it better in some…

