How does maximum approximation of the posterior choose a distribution?

I was learning about the maximum a posteriori probability (MAP) estimation for machine learning and I found a nice short video that essentially explained it as finding a distribution and tweaking the parameters to fit the observed data in a way that makes the observations most likely (makes sense). However, in mathematical terms, how does…

Why do Bayesian algorithms work good with small datasets?

I read very often that Bayesian algoriths are working good on small datasets why is that? (I think it is because it is good at generalizing, but why is that?) What is the reason why Bayesian works good at small datasets?