
Ultimate access to all questions.
You are responsible for ensuring the data quality of a dataset that is being used for a machine learning model. The dataset contains customer information, including names, addresses, and transaction history. Which of the following steps should you take to validate the data and ensure its completeness, consistency, accuracy, and integrity?
A
Perform data sampling to identify any missing or inconsistent data points and then manually correct them.
B
Use a data profiling tool to analyze the dataset and identify any anomalies or inconsistencies in the data.
C
Implement a data validation process that checks for data completeness, consistency, accuracy, and integrity using automated scripts.
D
Rely on the data source to ensure the quality of the data and assume that it is accurate and complete.