
Answer-first summary for fast verification
Answer: Use AWS Glue to create a job that reads the CSV data from Amazon S3, converts it to Parquet, and writes the result back to Amazon S3. Enable Glue's dynamic frame optimization feature to improve performance.
Option B is the most appropriate choice for this scenario. AWS Glue is a fully managed extract, transform, and load (ETL) service that can handle large-scale data processing tasks efficiently. By enabling dynamic frame optimization, Glue can automatically optimize the data transformation process, resulting in better performance. Storing the result in Amazon S3 ensures cost-effectiveness and easy accessibility for further processing or analysis.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Your company is planning to migrate a large dataset from a CSV format to Apache Parquet for better performance and efficiency in data processing. You are tasked with designing a data transformation pipeline using AWS services. What steps should you take to ensure the pipeline is optimized for performance and cost-effectiveness?
A
Use Amazon EMR to process the data and convert it to Parquet format, then store the result in Amazon S3.
B
Use AWS Glue to create a job that reads the CSV data from Amazon S3, converts it to Parquet, and writes the result back to Amazon S3. Enable Glue's dynamic frame optimization feature to improve performance.
C
Use AWS Lambda to process the CSV data in small batches, convert it to Parquet format, and store the result in Amazon S3. Monitor the Lambda function's performance and adjust the batch size as needed.
D
Use Amazon Redshift to load the CSV data, convert it to Parquet format using the COPY command, and store the result in Amazon S3. Optimize the Redshift cluster for performance by choosing the appropriate node type and number of nodes.
No comments yet.