How to deal with random weights initialization in hyperparameters tuning?

In the step of tuning my neural networks I often encounter a problem that every time I train the exact same network, it gives me different final error due to random initialization of the weights. Sometimes the differences are small and negligible, sometimes they are significant, depending on the data and architecture. My problem arises…

What’s the mathematical relationship between number of trainable parameters and size of training set?

Let’s say that I have a pretrained model where the pretraining set is very different from my training set. Let’s say I unfreeze layers that have X trainable parameters. What size should the training set be with/without data augmentation for multi-class/multi-label image classification with Y number of labels?

Can training a model on a dataset composed by real images and drawings hurt the training process of a real-world application model?

I’m training a classifier that’s supposed to be tested on underwater images. I’m wondering if feeding the model drawings of a certain class plus real images can affect the results badly. Was there a study on this? Or are there any past experiences anyone could share to help?