Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
When dealing with highly sparse datasets, such as user interaction matrices filled with null values, which storage format or technique optimizes both query performance and storage efficiency?
A
Normalize the dataset into multiple tables to separate dense columns from sparse ones, reducing storage overhead.
B
Use Delta Lake‘s binary storage format with custom compression algorithms tailored to sparse data.
C
Store data in a columnar format like Parquet, leveraging its built-in compression mechanisms for null values.
D
Implement a custom sparse matrix storage format as a UDF that compresses null values and decompresses them during queries.