Ultimate access to all questions.
You are training a deep neural network on a large dataset using batch training. After several epochs, you notice that the training loss is oscillating significantly and the model is not converging as expected. The current batch size is 256, and the learning rate is set to 0.01. The dataset is highly dimensional with a mix of feature scales, and you are under constraints to minimize training time without compromising model performance. Which of the following adjustments would be MOST effective to ensure convergence? Choose the best two options.