
Ultimate access to all questions.
You are creating a Delta Lake table in Databricks to store IoT sensor readings for a manufacturing plant.
The dataset contains the following columns:
event_date (date of reading)
machine_id (about 50 machines)
temperature (numeric)
vibration_level (numeric)
event_date (date of reading)
machine_id (about 50 machines)
temperature (numeric)
vibration_level (numeric)
The table will store 10 years of data, and queries almost always filter by event_date (e.g., WHERE event_date BETWEEN '2024-01-01' AND '2024-01-31').
Occasionally, queries also filter by machine_id.
Which table definition BEST optimizes query performance and storage layout?
A
CREATE TABLE sensor_readings USING DELTA PARTITIONED BY (event_date) AS SELECT * FROM raw_data;
B
CREATE TABLE sensor_readings USING DELTA PARTITIONED BY (machine_id) AS SELECT * FROM raw_data;
C
CREATE TABLE sensor_readings USING DELTA CLUSTER BY (event_date) AS SELECT * FROM raw_data;
D
CREATE TABLE sensor_readings USING DELTA CLUSTER BY (machine_id) AS SELECT * FROM raw_data;