Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

Databricks CI/CD Workflows

A data engineer needs to apply a common set of data quality rules to multiple tables. Which of the following best practices can they follow to do this? Select one response.

Exam-Like

Community

LLeetQuiz

Maintain data quality rules separately from the pipeline

Create a separate pipeline containing the data quality rules and run it concurrently with the pipeline

Tag the dataset used to populate the tables in the pipeline with data quality rule definitions

Create a task in Workflows for data quality rules and make it a dependency of the pipeline

Explanation:

Explanation

Option A is the correct answer because maintaining data quality rules separately from the pipeline follows the best practice of separation of concerns and enables reusability across multiple tables and pipelines.

Why Option A is correct:

Reusability: By maintaining data quality rules separately, the same set of rules can be applied to multiple tables without duplicating code
Maintainability: Changes to data quality rules can be made in one place and automatically apply to all tables using those rules
Separation of Concerns: Keeps data quality logic separate from data transformation logic, making both easier to manage
CI/CD Integration: Separate data quality rules can be version-controlled and deployed independently

Why other options are incorrect:

Option B: Running a separate pipeline concurrently doesn't ensure the data quality rules are applied as a dependency and may lead to timing issues
Option C: Tagging datasets doesn't actually apply data quality rules; it's just metadata and doesn't enforce the rules
Option D: While creating a task dependency is better than concurrent execution, it still embeds the rules within the workflow rather than maintaining them separately for reusability

This approach aligns with Databricks best practices for data quality management in CI/CD workflows.

Powered ByGPT-5.2

Comments

Loading comments...