
Answer-first summary for fast verification
Answer: Develop a comprehensive data validation framework that includes checks for data type consistency, removal of outliers, correction of data entry errors, and validation against international data protection standards.
The best approach involves a comprehensive data validation framework that addresses multiple aspects of data quality and compliance. This includes ensuring data type consistency, removing outliers, correcting errors, and validating against international standards. A basic SQL script (Option A) is insufficient for comprehensive data cleansing. Automatically excluding records (Option C) may lead to loss of valuable data without proper review. Manual inspection (Option D) is impractical for large datasets and does not scale. Therefore, Option B is the most effective and scalable solution for ensuring data accuracy, consistency, and compliance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are a data engineer for a large e-commerce platform that processes millions of transactions daily. The company is expanding into new markets and needs to ensure that its data is accurate, consistent, and compliant with international data protection regulations. The dataset includes customer information, transaction details, and product inventory. Given the scale of data and the need for compliance, which of the following approaches would BEST implement a data cleansing process to meet these requirements? Choose one option.
A
Use a basic SQL script to identify and remove duplicate customer records without further analysis.
B
Develop a comprehensive data validation framework that includes checks for data type consistency, removal of outliers, correction of data entry errors, and validation against international data protection standards.
C
Automatically exclude any transaction records that do not match predefined criteria, without manual review or consideration for data recovery.
D
Assign a team to manually inspect each record for accuracy and compliance, despite the volume of data.