AWS Certified Solutions Architect - Associate

Get started today

Ultimate access to all questions.

Explanation:

Explanation

Correct Answer: B

AWS Glue is the most appropriate solution for this scenario because:

Why AWS Glue is the Best Choice

Serverless ETL Service: AWS Glue is a fully managed, serverless ETL service specifically designed for data transformation tasks like converting CSV to Parquet format.
Minimal Development Effort: AWS Glue provides built-in capabilities for common data transformations, including format conversion from CSV to Parquet. You can use visual job creation or simple scripts.
Automatic Schema Discovery: The AWS Glue crawler automatically discovers the schema of CSV files, reducing manual effort.
S3 Integration: AWS Glue has native integration with Amazon S3, making it easy to read from and write to S3 buckets.
Scalability: AWS Glue automatically scales to handle hundreds of files without manual cluster management.

Why Other Options Are Not Optimal

Option A (Amazon EMR):

Requires creating and managing an EMR cluster
Requires writing Spark applications
Involves more operational overhead and development effort

Option C (AWS Batch):

Requires creating job definitions with Bash syntax
Involves more manual scripting and error handling
Not specifically optimized for ETL transformations

Option D (AWS Lambda):

Lambda has execution time limits (15 minutes)
May not be suitable for processing hundreds of large files efficiently
Requires more custom code for data transformation
Memory limitations may be problematic for large files

Key Benefits of AWS Glue for This Use Case

No infrastructure management: Fully serverless
Built-in transformations: CSV to Parquet conversion is straightforward
Cost-effective: Pay only for the resources consumed during job execution
Scheduling capabilities: Can be easily scheduled for daily processing
Error handling: Built-in retry mechanisms and monitoring

This solution requires the least development effort while providing a robust, scalable, and managed solution for the data transformation requirements.

Explanation:

Explanation

Correct Answer: B

AWS Glue is the most appropriate solution for this scenario because:

Why AWS Glue is the Best Choice

Serverless ETL Service: AWS Glue is a fully managed, serverless ETL service specifically designed for data transformation tasks like converting CSV to Parquet format.
Minimal Development Effort: AWS Glue provides built-in capabilities for common data transformations, including format conversion from CSV to Parquet. You can use visual job creation or simple scripts.
Automatic Schema Discovery: The AWS Glue crawler automatically discovers the schema of CSV files, reducing manual effort.
S3 Integration: AWS Glue has native integration with Amazon S3, making it easy to read from and write to S3 buckets.
Scalability: AWS Glue automatically scales to handle hundreds of files without manual cluster management.

Why Other Options Are Not Optimal

Option A (Amazon EMR):

Requires creating and managing an EMR cluster
Requires writing Spark applications
Involves more operational overhead and development effort

Option C (AWS Batch):

Requires creating job definitions with Bash syntax
Involves more manual scripting and error handling
Not specifically optimized for ETL transformations

Option D (AWS Lambda):

Lambda has execution time limits (15 minutes)
May not be suitable for processing hundreds of large files efficiently
Requires more custom code for data transformation
Memory limitations may be problematic for large files

Key Benefits of AWS Glue for This Use Case

No infrastructure management: Fully serverless
Built-in transformations: CSV to Parquet conversion is straightforward
Cost-effective: Pay only for the resources consumed during job execution
Scheduling capabilities: Can be easily scheduled for daily processing
Error handling: Built-in retry mechanisms and monitoring

This solution requires the least development effort while providing a robust, scalable, and managed solution for the data transformation requirements.

Comments (0)

No comments yet.

A company's reporting system delivers hundreds of .csv files to an Amazon S3 bucket each day. The company must convert these files to Apache Parquet format and must store the files in a transformed data bucket. Which solution will meet these requirements with the LEAST development effort?

Other

Community

UAnonymous

Last updated: February 23, 2026 at 11:39

Create an Amazon EMR cluster with Apache Spark installed. Write a Spark application to transform the data. Use EMR File System (EMRFS) to write files to the transformed data bucket.

Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.