
Ultimate access to all questions.
In the process of developing a machine learning model using regression techniques for a project aimed at predicting housing prices, you've encountered a scenario where the model performs exceptionally well during training but fails to generalize during testing, leading to unsatisfactory results. A preliminary investigation reveals issues related to data quality, including wrong and missing data. Your team decides to implement a data validation tool to address these issues. Considering the constraints of budget, time, and the need for scalability, which of the following issues is NOT directly related to Data Validation? Choose the best option.
A
Duplicate examples in the dataset, which could skew the model's understanding of feature importance.
B
Feature values that fall outside expected ranges, potentially leading to inaccurate model predictions.
C
Instances where values are missing from the dataset, affecting the model's ability to learn from all available information.
D
Incorrect labels in the training data, which directly mislead the model during the learning process.
E
The presence of categorical variables that require encoding before being used in the model.