
Answer-first summary for fast verification
Answer: Utilize AWS Glue for data discovery and ETL operations, directing output to the specified data bucket.
The correct answer is B: Utilize AWS Glue for data discovery and ETL operations, directing output to the specified data bucket. AWS Glue is designed specifically for ETL tasks, provides an easy-to-use interface, and requires minimal development effort compared to setting up and maintaining an EMR cluster, managing AWS Batch jobs, or scripting custom solutions with AWS Lambda. By creating an AWS Glue crawler to discover the data and setting up an ETL job, you can easily transform the .csv files to ApacheParquet format and store them in the desired bucket with minimal effort.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A company receives daily .csv file deliveries to an Amazon S3 bucket and needs to convert them into ApacheParquet format with minimal development for storage in a separate data bucket. What is the most efficient solution?
A
Set up an Amazon EMR cluster with Spark for data transformation and use EMRFS to store results in the target bucket.
B
Utilize AWS Glue for data discovery and ETL operations, directing output to the specified data bucket.
C
Employ AWS Batch with a Bash job definition for transformation, submitting as an array job to the target bucket.
D
Develop an AWS Lambda function to handle the conversion and configure S3 event notifications to trigger it automatically.
No comments yet.