It is commonly agreed on that given a large enough dataset and a large enough neural network every (computational) problem is solvable. In practice this introduces three distinct problems:

  1. data requirements
  2. time/hardware/money to train such a model
  3. inference hardware

Similar to latency vs. throughput considerations, model size considerations are very important. This applies to training as well as deployment.
While the deployment considerations result in the the latency vs. throughput discussion, training considerations are mainly tied to budgets and datasets available.

With respect to training, of course the model used for inference needs to be trained. However, super large models on the training side can be used to significantly reduce corrections of annotations. High quality ground truth annotations can then be used to use e.g. neural architecture search (NAS) to find optimal performing models given inference constraints. However, even resources/model sizes/search space for NAS come with similar budget limitations. There is a constant trade-off between model size and usefulness/suitable minimum/optimal performance.