
Ultimate access to all questions.
Answer-first summary for fast verification
Answer: date
Partitioning a Delta Lake table effectively requires selecting a column that can divide the data into meaningful, manageable segments without creating too many small files. The `date` column (E) is a strong candidate because it allows for partitioning by day, which is a common and efficient approach for time-series data. This method supports efficient querying by date ranges and helps in managing the size of each partition. Other options like `post_time` (A) are too granular, leading to excessive small files. `latitude` (B) and `user_id` (D) have high cardinality, which can result in too many partitions, and `post_id` (C) is unique to each post, making it unsuitable for partitioning.
Author: LeetQuiz Editorial Team
No comments yet.
Given a Delta Lake table with the following schema for user content post metadata:
user_id LONG,
post_text STRING,
post_id STRING,
longitude FLOAT,
latitude FLOAT,
post_time TIMESTAMP,
date DATE
user_id LONG,
post_text STRING,
post_id STRING,
longitude FLOAT,
latitude FLOAT,
post_time TIMESTAMP,
date DATE
Which column would be the most suitable for partitioning the Delta table?
A
post_time
B
latitude
C
post_id
D
user_id
E
date