AWS Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Explanation:

AWS Glue DataBrew is designed for visual, no-code data profiling and transformation via recipes, and it supports recurring schedules. It allows exploring data, authoring a recipe, and scheduling jobs without writing code. Loading into Redshift requires SQL. Lambda introduces custom code. EMR and Spark add unnecessary complexity and cluster management.

Explanation:

Comments (0)

No comments yet.

A retail analytics team at Riverstone Gear plans to prepare a customer orders dataset for machine learning using AWS Glue DataBrew. The raw files in Amazon S3 contain missing fields, concatenated name-and-address values, and columns that are no longer needed. The team wants to profile and fix the data with point-and-click steps and have the same workflow run automatically every 6 hours without writing code. Which approach should they take?

Other

RRitesh

Last updated: May 10, 2026 at 04:26

Load the data into Amazon Redshift, clean it with SQL statements, and trigger recurring runs using Amazon EventBridge

Create AWS Lambda functions in Python to parse and cleanse the files, and schedule invocations with Amazon EventBridge

Build a DataBrew project to explore the data, author a recipe, and create a scheduled DataBrew job to apply it on a 6-hour cadence

Provision an Amazon EMR cluster to run Apache Spark data cleaning jobs and orchestrate the schedule with AWS Step Functions