AWS Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Your company is planning to migrate a large dataset from a CSV format to Apache Parquet for better performance and efficiency in data processing. You are tasked with designing a data transformation pipeline using AWS services. What steps should you take to ensure the pipeline is optimized for performance and cost-effectiveness?

Simulated

Use Amazon EMR to process the data and convert it to Parquet format, then store the result in Amazon S3.

11.1%

Use AWS Glue to create a job that reads the CSV data from Amazon S3, converts it to Parquet, and writes the result back to Amazon S3. Enable Glue's dynamic frame optimization feature to improve performance.

Comments

Loading comments...

Use AWS Lambda to process the CSV data in small batches, convert it to Parquet format, and store the result in Amazon S3. Monitor the Lambda function's performance and adjust the batch size as needed.

5.6%

Use Amazon Redshift to load the CSV data, convert it to Parquet format using the COPY command, and store the result in Amazon S3. Optimize the Redshift cluster for performance by choosing the appropriate node type and number of nodes.

16.7%