
Ultimate access to all questions.
A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped.
Which approach can the data engineer take to identify the table that is dropping the records?
A
They can set up separate expectations for each table when developing their DLT pipeline.
B
They can navigate to the DLT pipeline page, click on the "Error" button, and review the present errors.
C
They can set up DLT to notify them via email when records are dropped.
D
They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.
Explanation:
Correct Answer: D - They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.
Why Option D is correct:
Why other options are incorrect:
Option A: Setting up separate expectations for each table is a development-time configuration, not a monitoring approach. While expectations define what constitutes valid/invalid data, they don't help identify which table is currently dropping records during pipeline execution.
Option B: The "Error" button typically shows pipeline execution errors or failures, not data quality drops. Dropped records due to quality concerns are not considered errors in the traditional sense - they're expected behavior when expectations are violated.
Option C: While DLT can be configured to send notifications, email notifications don't provide the granular detail needed to identify which specific table is dropping records. Notifications typically alert about pipeline failures or completions, not detailed data quality statistics per table.
Key Concept: Delta Live Tables provides built-in data quality monitoring through the Databricks UI, where users can view expectation metrics for each table, including counts of records that passed and failed expectations, helping identify where data is being filtered out in the pipeline.