
Answer-first summary for fast verification
Answer: Define data quality rules using AWS Glue DataBrew by creating a new project, selecting the user engagement dataset, and specifying rules to identify and filter out fake engagements.
Option C is the correct answer. To ensure the data quality of the user engagement dataset, you should define data quality rules using AWS Glue DataBrew. By creating a new project, selecting the dataset, and specifying rules to identify and filter out fake engagements, you can maintain the integrity of the user engagement data. Manually inspecting each user engagement (Option A) is not efficient for large datasets. Writing custom scripts (Option B) can be time-consuming and may not cover all possible fake engagement patterns. Ignoring data quality checks (Option D) is not recommended as it can lead to poor data quality and incorrect analysis.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Your team is working on a data pipeline that processes data from a social media platform. You have been tasked with ensuring the data quality of the user engagement dataset. Describe the steps you would take to run data quality checks on the user engagement dataset and explain how you would define data quality rules to identify and filter out fake user engagements.
A
Run data quality checks by manually inspecting each user engagement and identifying fake engagements.
B
Use AWS Glue to run data quality checks by writing custom scripts that identify fake engagements based on specific patterns or anomalies.
C
Define data quality rules using AWS Glue DataBrew by creating a new project, selecting the user engagement dataset, and specifying rules to identify and filter out fake engagements.
D
Ignore data quality checks and assume all user engagements are genuine.
No comments yet.