
Answer-first summary for fast verification
Answer: Use a columnar storage file format., Partition the data based on the most common query predicates.
For optimal performance using Amazon Redshift Spectrum, data in Amazon S3 should be stored in a columnar format (such as Apache Parquet or ORC) to allow Spectrum to fetch only the required columns instead of the entire dataset. Additionally, partitioning the data allows Spectrum to filter and prune data directories using the query predicates, drastically reducing the amount of data scanned and significantly speeding up the query.
Author: Ritesh Yadav
Ultimate access to all questions.
Question 42
A company is building an analytics solution. The solution uses Amazon S3 for data lake storage and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum to query the data that is in Amazon S3. Which actions will provide the FASTEST queries? (Choose two.)
A
Use gzip compression to compress individual files to sizes that are between 1 GB and 5 GB.
B
Use a columnar storage file format.
C
Partition the data based on the most common query predicates.
D
Split the data into files that are less than 10 KB.
E
Use file formats that are not splittable.
No comments yet.