
Explanation:
Apache Parquet is a columnar storage format — when users select specific columns, Parquet allows Athena to read only those columns’ data blocks, skipping all other columns entirely. This dramatically reduces data scanned (and therefore cost and time). Snappy compression further reduces file sizes and improves read speeds. CSV remains row-based even with compression, so Options B and D provide only marginal improvement.
Ultimate access to all questions.
Question 40.
A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column. Which solution will MOST speed up the Athena query performance?
A
Change the data format from .csv to JSON format. Apply Snappy compression.
B
Compress the .csv files by using Snappy compression.
C
Change the data format from .csv to Apache Parquet. Apply Snappy compression.
D
Compress the .csv files by using gzip compression.
No comments yet.