
Ultimate access to all questions.
A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped. Which of the following approaches can the data engineer take to identify the table that is dropping the records?
A
They can set up separate expectations for each table when developing their DLT pipeline.
B
They cannot determine which table is dropping the records.
C
They can set up DLT to notify them via email when records are dropped.
D
They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.
E
They can navigate to the DLT pipeline page, click on the "Error" button, and review the present errors.
Explanation:
The correct answer is D.
In Delta Live Tables (DLT), you can monitor data quality metrics and dropped records through the DLT pipeline UI. Here's why:
DLT Pipeline Page: When you navigate to a DLT pipeline in Databricks, you can see all the tables in your pipeline.
Table Details: By clicking on each table, you can view detailed information including:
Why other options are incorrect:
Best Practice: To effectively monitor data quality in DLT pipelines, regularly check the data quality statistics for each table in the DLT pipeline UI, and consider setting up expectations with descriptive names to make troubleshooting easier.