How does maximum approximation of the posterior choose a distribution?

I was learning about the maximum a posteriori probability (MAP) estimation for machine learning and I found a nice short video that essentially explained it as finding a distribution and tweaking the parameters to fit the observed data in a way that makes the observations most likely (makes sense). However, in mathematical terms, how does…

Why do Bayesian algorithms work good with small datasets?

I read very often that Bayesian algoriths are working good on small datasets why is that? (I think it is because it is good at generalizing, but why is that?) https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=7636&context=etd https://math.stackexchange.com/questions/2589224/how-is-bayesian-inference-better-than-classical-inference-on-small-samples What is the reason why Bayesian works good at small datasets?