
Answer-first summary for fast verification
Answer: Define data quality rules using AWS Glue DataBrew by creating a new project, selecting the customer reviews dataset, and specifying rules to identify genuine reviews and filter out spam.
Option C is the correct answer. To ensure the data quality of the customer reviews dataset, you should define data quality rules using AWS Glue DataBrew. By creating a new project, selecting the dataset, and specifying rules to identify genuine reviews and filter out spam, you can ensure the reviews are genuine and not spam. Manually inspecting each review (Option A) is not efficient for large datasets. Writing custom scripts (Option B) can be time-consuming and may not cover all possible spam patterns. Ignoring data quality checks (Option D) is not recommended as it can lead to poor data quality and incorrect analysis.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Your team is working on a data pipeline that processes data from a large e-commerce platform. You have been tasked with ensuring the data quality of the customer reviews dataset. Describe the steps you would take to run data quality checks on the customer reviews dataset and explain how you would define data quality rules to ensure the reviews are genuine and not spam.
A
Run data quality checks by manually inspecting each review and identifying spam reviews.
B
Use AWS Glue to run data quality checks by writing custom scripts that identify spam reviews based on specific keywords or patterns.
C
Define data quality rules using AWS Glue DataBrew by creating a new project, selecting the customer reviews dataset, and specifying rules to identify genuine reviews and filter out spam.
D
Ignore data quality checks and assume all reviews are genuine.