Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
In a data lakehouse environment, how should you model a dataset that is frequently queried for the latest month's data but also contains years of historical data, to balance query performance and storage cost?
A
Normalize the dataset into several tables based on access patterns, without considering the impact on storage costs.
B
Store all data in the most cost-effective storage available and rely heavily on caching to optimize query performance.
C
Partition the data by month and use data tiering, placing the current month's data in high-performance storage and older data in more cost-effective storage.
D
Create a duplicate of the latest month's data in a separate high-performance storage system and combine it with historical data at query time to reduce costs.