
Answer-first summary for fast verification
Answer: Define data quality rules using AWS Glue DataBrew by creating a new project, selecting the customer interaction records dataset, and specifying rules to identify and filter out irrelevant or duplicate customer interactions.
Option C is the correct answer. To ensure the data quality of the customer interaction records dataset, you should define data quality rules using AWS Glue DataBrew. By creating a new project, selecting the dataset, and specifying rules to identify and filter out irrelevant or duplicate customer interactions, you can maintain the integrity of the customer interaction data. Manually inspecting each customer interaction record (Option A) is not efficient for large datasets. Writing custom scripts (Option B) can be time-consuming and may not cover all possible irrelevant or duplicate criteria. Ignoring data quality checks (Option D) is not recommended as it can lead to poor data quality and incorrect analysis.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Your team is working on a data pipeline that processes data from a customer relationship management (CRM) system. The data includes customer interaction records with information about customer inquiries and feedback. You have been tasked with ensuring the data quality of the customer interaction records dataset. Describe the steps you would take to run data quality checks on the customer interaction records dataset and explain how you would define data quality rules to identify and filter out irrelevant or duplicate customer interactions.
A
Run data quality checks by manually inspecting each customer interaction record and identifying irrelevant or duplicate interactions.
B
Use AWS Glue to run data quality checks by writing custom scripts that identify irrelevant or duplicate interactions based on specific criteria.
C
Define data quality rules using AWS Glue DataBrew by creating a new project, selecting the customer interaction records dataset, and specifying rules to identify and filter out irrelevant or duplicate customer interactions.
D
Ignore data quality checks and assume all customer interactions are relevant and unique.
No comments yet.