AWS Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Explanation:

AWS Glue Python shell jobs run on a single node and are designed for lightweight ETL workloads with small datasets (under ~100 MB per file is ideal). They are significantly cheaper than Spark-based Glue jobs (which provision a distributed cluster) or EMR clusters, and avoid the operational overhead of managing EKS or EMR. For daily small CSV files, pandas in a Python shell job is the most cost-effective choice.

Explanation:

Comments (0)

No comments yet.

Question 44.
A data engineer needs to build an extract, transform, and load (ETL) job. The ETL job will process daily incoming .csv files that users upload to an Amazon S3 bucket. The size of each S3 object is less than 100 MB. Which solution will meet these requirements MOST cost-effectively?

Exam-Like

Community

RRitesh

Last updated: May 4, 2026 at 14:51

Write a custom Python application. Host the application on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

Write a PySpark ETL script. Host the script on an Amazon EMR cluster.

Write an AWS Glue PySpark job. Use Apache Spark to transform the data.

Write an AWS Glue Python shell job. Use pandas to transform the data.

AWS Certified Data Engineer - Associate

Get started today

Comments (0)

Get started today

Question 44. A data engineer needs to build an extract, transform, and load (ETL) job. The ETL job will process daily incoming .csv files that users upload to an Amazon S3 bucket. The size of each S3 object is less than 100 MB. Which solution will meet these requirements MOST cost-effectively?

Comments (0)

Question 44.
A data engineer needs to build an extract, transform, and load (ETL) job. The ETL job will process daily incoming .csv files that users upload to an Amazon S3 bucket. The size of each S3 object is less than 100 MB. Which solution will meet these requirements MOST cost-effectively?