
Answer-first summary for fast verification
Answer: Implement a data validation process that checks for data completeness, consistency, accuracy, and integrity using automated scripts.
Option C is the correct approach as it involves implementing a systematic data validation process using automated scripts to check for data completeness, consistency, accuracy, and integrity. This ensures that the dataset is of high quality and suitable for use in a machine learning model. Option A is not scalable and may not identify all issues. Option B is a good first step but does not replace the need for a validation process. Option D is not reliable as the data source may not always provide accurate and complete data.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are responsible for ensuring the data quality of a dataset that is being used for a machine learning model. The dataset contains customer information, including names, addresses, and transaction history. Which of the following steps should you take to validate the data and ensure its completeness, consistency, accuracy, and integrity?
A
Perform data sampling to identify any missing or inconsistent data points and then manually correct them.
B
Use a data profiling tool to analyze the dataset and identify any anomalies or inconsistencies in the data.
C
Implement a data validation process that checks for data completeness, consistency, accuracy, and integrity using automated scripts.
D
Rely on the data source to ensure the quality of the data and assume that it is accurate and complete.
No comments yet.