LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


You are creating a Delta Lake table in Databricks to store IoT sensor readings for a manufacturing plant. The dataset contains the following columns:

event_date (date of reading)
machine_id (about 50 machines)
temperature (numeric)
vibration_level (numeric)

The table will store 10 years of data, and queries almost always filter by event_date (e.g., WHERE event_date BETWEEN '2024-01-01' AND '2024-01-31'). Occasionally, queries also filter by machine_id.

Which table definition BEST optimizes query performance and storage layout?

Other



Explanation:

event_date is low-cardinality (one partition per day) and is always used in filters, making it ideal for PARTITIONED BY to enable partition pruning. Option B: machine_id has only ~50 values, but queries are not always filtered by it — less effective for partitioning. Option C: Clustering on event_date is less effective than partitioning because partition pruning is more powerful for date filters. Option D: Clustering on machine_id won’t help much since it’s not the primary filter column.

Powered ByGPT-5