
Answer-first summary for fast verification
Answer: Normalize the data by scaling all features to a 0 to 1 range, ensuring each feature contributes equally to the model's learning process without altering the original data distribution significantly.
Normalization is the most effective strategy in this scenario because it directly addresses the issue of feature scale disparity by scaling all features to a common range (0 to 1). This approach ensures that each feature contributes equally to the model's learning process, mitigating the risk of overfitting due to high-magnitude features. While logarithmic transformation and PCA have their uses, they are not as universally applicable or as straightforward for this specific problem. Binning, although a form of data simplification, risks losing important details in the data, making normalization the preferred choice.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the process of developing a machine learning model aimed at forecasting stock market trends, you identify that certain features exhibit a significantly wider range of values compared to others. This disparity in feature scales could potentially lead to overfitting, with the model disproportionately weighting the high-magnitude features. Considering the need for a solution that is both effective and efficient, and taking into account constraints such as computational cost and the preservation of feature information, which of the following strategies would be the MOST appropriate to implement? (Choose one correct option)
A
Apply a logarithmic transformation to adjust the scale of features, which can be particularly useful for features with exponential distributions.
B
Employ principal component analysis (PCA) to reduce the dimensionality of the data, thereby indirectly addressing the scale disparity among features.
C
Normalize the data by scaling all features to a 0 to 1 range, ensuring each feature contributes equally to the model's learning process without altering the original data distribution significantly.
D
Implement a binning system to replace each feature's value with a corresponding bin number, which simplifies the data but may lead to loss of granularity.