
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A company has an application that places hundreds of .csv files into an Amazon S3 bucket every hour. The files are 1 GB in size. Each time a file is uploaded, the company needs to convert the file to Apache Parquet format and place the output file into an S3 bucket.
Which solution will meet these requirements with the LEAST operational overhead?
A
Create an AWS Lambda function to download the .csv files, convert the files to Parquet format, and place the output files in an S3 bucket. Invoke the Lambda function for each S3 PUT event.
B
Create an Apache Spark job to read the .csv files, convert the files to Parquet format, and place the output files in an S3 bucket. Create an AWS Lambda function for each S3 PUT event to invoke the Spark job.
C
Create an AWS Glue table and an AWS Glue crawler for the S3 bucket where the application places the .csv files. Schedule an AWS Lambda function to periodically use Amazon Athena to query the AWS Glue table, convert the query results into Parquet format, and place the output files into an S3 bucket.
D
Create an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Parquet format and place the output files into an S3 bucket. Create an AWS Lambda function for each S3 PUT event to invoke the ETL job.
Explanation:
Correct Answer: D
Why Option D is the best solution with the LEAST operational overhead:
AWS Glue is a serverless ETL service designed specifically for data transformation tasks like converting CSV to Parquet format. It handles the infrastructure management automatically.
Automatic scaling: AWS Glue can handle hundreds of 1GB files per hour without manual intervention.
Built-in Parquet support: AWS Glue has native support for reading CSV and writing Parquet format.
Event-driven architecture: Using Lambda to trigger the Glue ETL job for each S3 PUT event ensures immediate processing without polling or scheduling delays.
Analysis of other options:
Option A (Lambda function):
Option B (Apache Spark job with Lambda trigger):
Option C (Glue crawler + Athena + Lambda):
Key AWS services used in Option D:
This solution provides a fully managed, serverless architecture that automatically scales to handle the workload with minimal operational overhead.