
Explanation:
Build a DataBrew project to explore the data, author a recipe, and create a scheduled DataBrew job to apply it on a 6-hour cadence is correct because AWS Glue DataBrew is designed for visual, no-code data profiling and transformation via recipes, and it supports recurring schedules to automate those steps. Load the data into Amazon Redshift, clean it with SQL statements, and trigger recurring runs using Amazon EventBridge is not ideal because it requires writing SQL and lacks the visual, point-and-click recipe workflow requested. Create AWS Lambda functions in Python to parse and cleanse the files, and schedule invocations with Amazon EventBridge is incorrect since it introduces custom code and operational overhead, which conflicts with the no-code requirement. Provision an Amazon EMR cluster to run Apache Spark data cleaning jobs and orchestrate the schedule with AWS Step Functions is unnecessary complexity and code-heavy for a task that DataBrew can handle visually and without managing clusters. When you see requirements for visual, no-code data preparation with repeatable steps and easy scheduling, think AWS Glue DataBrew and its recipes plus scheduled jobs.
Ultimate access to all questions.
A retail analytics team at Riverstone Gear plans to prepare a customer orders dataset for machine learning using AWS Glue DataBrew. The raw files in Amazon S3 contain missing fields, concatenated name-and-address values, and columns that are no longer needed. The team wants to profile and fix the data with point-and-click steps and have the same workflow run automatically every 6 hours without writing code. Which approach should they take?
A
Load the data into Amazon Redshift, clean it with SQL statements, and trigger recurring runs using Amazon EventBridge
B
Create AWS Lambda functions in Python to parse and cleanse the files, and schedule invocations with Amazon EventBridge
C
Build a DataBrew project to explore the data, author a recipe, and create a scheduled DataBrew job to apply it on a 6-hour cadence
D
Provision an Amazon EMR cluster to run Apache Spark data cleaning jobs and orchestrate the schedule with AWS Step Functions
No comments yet.