
Answer-first summary for fast verification
Answer: Maintain data quality rules separately from the pipeline
## Explanation **Option A** is the correct answer because maintaining data quality rules separately from the pipeline follows the best practice of separation of concerns and enables reusability across multiple tables and pipelines. ### Why Option A is correct: - **Reusability**: By maintaining data quality rules separately, the same set of rules can be applied to multiple tables without duplicating code - **Maintainability**: Changes to data quality rules can be made in one place and automatically apply to all tables using those rules - **Separation of Concerns**: Keeps data quality logic separate from data transformation logic, making both easier to manage - **CI/CD Integration**: Separate data quality rules can be version-controlled and deployed independently ### Why other options are incorrect: - **Option B**: Running a separate pipeline concurrently doesn't ensure the data quality rules are applied as a dependency and may lead to timing issues - **Option C**: Tagging datasets doesn't actually apply data quality rules; it's just metadata and doesn't enforce the rules - **Option D**: While creating a task dependency is better than concurrent execution, it still embeds the rules within the workflow rather than maintaining them separately for reusability This approach aligns with Databricks best practices for data quality management in CI/CD workflows.
Author: LeetQuiz .
Ultimate access to all questions.
Databricks CI/CD Workflows
A data engineer needs to apply a common set of data quality rules to multiple tables. Which of the following best practices can they follow to do this? Select one response.
A
Maintain data quality rules separately from the pipeline
B
Create a separate pipeline containing the data quality rules and run it concurrently with the pipeline
C
Tag the dataset used to populate the tables in the pipeline with data quality rule definitions
D
Create a task in Workflows for data quality rules and make it a dependency of the pipeline
No comments yet.