
Answer-first summary for fast verification
Answer: Implement a data validation process that checks for data completeness, consistency, accuracy, and integrity before training the models.
Option D is the correct approach as it involves implementing a data validation process that checks for data completeness, consistency, accuracy, and integrity before training the models. This ensures that the dataset used for training is of high quality and suitable for machine learning. Option A is not scalable and may not identify all issues. Option B is a good first step but should be followed by a data validation process. Option C is incorrect as data quality is crucial for the performance of machine learning models.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are working on a machine learning project that requires training models on a large dataset. Which of the following steps should you take to ensure the data quality of the dataset used for training?
A
Perform data sampling to identify any missing or inconsistent data points and then manually correct them before training the models.
B
Use a data profiling tool to analyze the dataset and identify any anomalies or inconsistencies in the data before training the models.
C
Assume that the dataset is of high quality and train the models without any data validation or profiling.
D
Implement a data validation process that checks for data completeness, consistency, accuracy, and integrity before training the models.
No comments yet.