Ultimate access to all questions.
What is the best strategy to avoid data leakage in machine learning?
Explanation:
To prevent data leakage, it's crucial to perform feature engineering exclusively on the training set. This ensures that the model's performance on validation and test sets accurately reflects its ability to generalize to unseen data. Data leakage occurs when information from outside the training set influences the model's training, leading to overly optimistic performance estimates. The other options either do not address the root cause of data leakage or may exacerbate the problem by encouraging overfitting or including irrelevant information.