
Answer-first summary for fast verification
Answer: Create an AWS Glue extract, transform, and load (ETL) job that runs on a schedule. Configure the ETL job to process the .csv files and store the processed data in Amazon Redshift.
## Explanation **Correct Answer: A** **Why Option A is correct:** 1. **AWS Glue is a serverless ETL service** that requires minimal operational overhead - no infrastructure to manage 2. **Scheduled execution** allows for automated processing of CSV files 3. **Direct integration with Amazon Redshift** - the COTS application can query data from Redshift 4. **AWS Glue can process CSV files** and transform them into a format suitable for Redshift 5. **Least operational overhead** compared to other options since AWS Glue is fully managed **Why other options are incorrect:** **Option B:** - Requires managing EC2 instances (operational overhead for patching, scaling, monitoring) - Requires developing and maintaining custom Python scripts - Cron scheduling on EC2 is less reliable than AWS managed scheduling - Converting to .sql files may not be the optimal format for the COTS application **Option C:** - DynamoDB is not mentioned as a supported data source for the COTS application (only Redshift and S3) - Lambda functions have execution time limits (15 minutes) which may be insufficient for large ETL jobs - Requires managing DynamoDB table capacity and costs - More complex architecture than needed **Option D:** - Amazon EMR requires significant operational overhead (cluster management, scaling, monitoring) - Weekly schedule may not meet real-time or frequent processing needs - EMR clusters are expensive to run continuously and require management - Overkill for simple CSV transformation tasks **Key AWS Services Considered:** - **AWS Glue**: Serverless ETL service perfect for scheduled data transformation - **Amazon Redshift**: Data warehouse where COTS application can query data - **Amazon S3**: Source storage for CSV files **Architecture Pattern:** Legacy App → S3 (CSV files) → AWS Glue (scheduled ETL) → Amazon Redshift → COTS Application (SQL queries)
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A company uses a legacy application to produce data in CSV format. The legacy application stores the output data in Amazon S3. The company is deploying a new commercial off-the-shelf (COTS) application that can perform complex SQL queries to analyze data that is stored in Amazon Redshift and Amazon S3 only. However, the COTS application cannot process the .csv files that the legacy application produces.
The company cannot update the legacy application to produce data in another format. The company needs to implement a solution so that the COTS application can use the data that the legacy application produces.
Which solution will meet these requirements with the LEAST operational overhead?
A
Create an AWS Glue extract, transform, and load (ETL) job that runs on a schedule. Configure the ETL job to process the .csv files and store the processed data in Amazon Redshift.
B
Develop a Python script that runs on Amazon EC2 instances to convert the .csv files to .sql files. Invoke the Python script on a cron schedule to store the output files in Amazon S3.
C
Create an AWS Lambda function and an Amazon DynamoDB table. Use an S3 event to invoke the Lambda function. Configure the Lambda function to perform an extract, transform, and load (ETL) job to process the .csv files and store the processed data in the DynamoDB table.
D
Use Amazon EventBridge to launch an Amazon EMR cluster on a weekly schedule. Configure the EMR cluster to perform an extract, transform, and load (ETL) job to process the .csv files and store the processed data in an Amazon Redshift table.