
Answer-first summary for fast verification
Answer: Define data quality rules using AWS Glue DataBrew by creating a new project, selecting the dataset, and specifying the rules to be applied.
Option C is the correct answer. To ensure data quality, you should define data quality rules using AWS Glue DataBrew. This involves creating a new project, selecting the dataset, and specifying the rules to be applied. Running data quality checks manually (Option A) is not efficient for large datasets. Writing custom scripts (Option B) can be time-consuming and may not cover all possible data quality issues. Ignoring data quality checks (Option D) is not recommended as it can lead to poor data quality and incorrect analysis.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Your team is working on a data pipeline that processes large datasets using AWS Glue. You have been tasked with ensuring the data quality of the incoming data. Describe the steps you would take to run data quality checks while processing the data, and explain how you would define data quality rules using AWS Glue DataBrew.
A
Run data quality checks by manually inspecting the data and identifying any inconsistencies.
B
Use AWS Glue to run data quality checks by writing custom scripts that check for empty fields and other data quality issues.
C
Define data quality rules using AWS Glue DataBrew by creating a new project, selecting the dataset, and specifying the rules to be applied.
D
Ignore data quality checks and focus solely on processing the data as quickly as possible.