
Ultimate access to all questions.
Answer-first summary for fast verification
Answer: Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.
# Explanation **Correct Answer: B** AWS Glue is the most appropriate solution for this scenario because: ## Why AWS Glue is the Best Choice 1. **Serverless ETL Service**: AWS Glue is a fully managed, serverless ETL service specifically designed for data transformation tasks like converting CSV to Parquet format. 2. **Minimal Development Effort**: AWS Glue provides built-in capabilities for common data transformations, including format conversion from CSV to Parquet. You can use visual job creation or simple scripts. 3. **Automatic Schema Discovery**: The AWS Glue crawler automatically discovers the schema of CSV files, reducing manual effort. 4. **S3 Integration**: AWS Glue has native integration with Amazon S3, making it easy to read from and write to S3 buckets. 5. **Scalability**: AWS Glue automatically scales to handle hundreds of files without manual cluster management. ## Why Other Options Are Not Optimal **Option A (Amazon EMR)**: - Requires creating and managing an EMR cluster - Requires writing Spark applications - Involves more operational overhead and development effort **Option C (AWS Batch)**: - Requires creating job definitions with Bash syntax - Involves more manual scripting and error handling - Not specifically optimized for ETL transformations **Option D (AWS Lambda)**: - Lambda has execution time limits (15 minutes) - May not be suitable for processing hundreds of large files efficiently - Requires more custom code for data transformation - Memory limitations may be problematic for large files ## Key Benefits of AWS Glue for This Use Case - **No infrastructure management**: Fully serverless - **Built-in transformations**: CSV to Parquet conversion is straightforward - **Cost-effective**: Pay only for the resources consumed during job execution - **Scheduling capabilities**: Can be easily scheduled for daily processing - **Error handling**: Built-in retry mechanisms and monitoring This solution requires the least development effort while providing a robust, scalable, and managed solution for the data transformation requirements.
Author: LeetQuiz Editorial Team
No comments yet.
A company's reporting system delivers hundreds of .csv files to an Amazon S3 bucket each day. The company must convert these files to Apache Parquet format and must store the files in a transformed data bucket. Which solution will meet these requirements with the LEAST development effort?
A
Create an Amazon EMR cluster with Apache Spark installed. Write a Spark application to transform the data. Use EMR File System (EMRFS) to write files to the transformed data bucket.
B
Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.
C
Use AWS Batch to create a job definition with Bash syntax to transform the data and output the data to the transformed data bucket. Use the job definition to submit a job. Specify an array job as the job type.
D
Create an AWS Lambda function to transform the data and output the data to the transformed data bucket. Configure an event notification for the S3 bucket. Specify the Lambda function as the destination for the event notification.