
Ultimate access to all questions.
A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped.
Which approach can the data engineer take to identify the table that is dropping the records?
A
They can set up separate expectations for each table when developing their DLT pipeline.
B
They can navigate to the DLT pipeline page, click on the "Error" button, and review the present errors.
C
They can set up DLT to notify them via email when records are dropped.
D
They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.
Explanation:
In Delta Live Tables (DLT), when you want to identify at which specific table in the pipeline records are being dropped due to quality concerns, the correct approach is:
Option D: Navigate to the DLT pipeline page, click on each table, and view the data quality statistics.
Here's why:
Data Quality Statistics in DLT: DLT provides detailed data quality metrics for each table in the pipeline. When you configure expectations with EXPECT or EXPECT ALL and specify ON VIOLATION DROP ROW, DLT tracks:
Table-level Monitoring: By clicking on each table in the DLT pipeline UI, you can view:
Why other options are incorrect:
Best Practice: For troubleshooting data quality issues in DLT pipelines:
EXPECT ALL for comprehensive quality monitoringThis approach allows the data engineer to systematically check each table's data quality metrics to identify the specific table(s) where records are being dropped due to quality violations.