
Answer-first summary for fast verification
Answer: Decrease the learning rate to 0.001 to allow for more precise updates to the model weights., Normalize the input features to have zero mean and unit variance to stabilize the training process.
Oscillations in the training loss suggest that the learning rate may be too high, causing the model to overshoot the minimum of the loss function. Decreasing the learning rate (Option B) can help the model to converge more reliably by making smaller, more precise updates to the weights. Additionally, normalizing the input features (Option D) can stabilize the training process by ensuring that all features contribute equally to the loss and gradients, which is particularly important in high-dimensional datasets with varying feature scales. While increasing the batch size (Option A) can reduce gradient variance, it may not address the root cause of oscillations. Increasing the learning rate (Option C) would likely exacerbate the problem. Learning rate scheduling (Option E) is a valid strategy but is not as immediately effective as directly adjusting the learning rate and normalizing the inputs.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are training a deep neural network on a large dataset using batch training. After several epochs, you notice that the training loss is oscillating significantly and the model is not converging as expected. The current batch size is 256, and the learning rate is set to 0.01. The dataset is highly dimensional with a mix of feature scales, and you are under constraints to minimize training time without compromising model performance. Which of the following adjustments would be MOST effective to ensure convergence? Choose the best two options.
A
Increase the batch size to 512 to reduce the variance in the gradient estimates.
B
Decrease the learning rate to 0.001 to allow for more precise updates to the model weights.
C
Increase the learning rate to 0.1 to speed up the convergence process.
D
Normalize the input features to have zero mean and unit variance to stabilize the training process.
E
Introduce learning rate scheduling to gradually decrease the learning rate as training progresses.