Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
You are about to deploy a major update to a data transformation pipeline in Azure Databricks. What is the best strategy to ensure the updated pipeline maintains or enhances data quality before going live?
A
Use Azure Data Factory to orchestrate a parallel run of both the current and updated pipelines, comparing outputs for discrepancies.
B
Implement unit and integration tests within Databricks notebooks that validate data outputs against a controlled set of test data, integrating these tests into your CI/CD pipeline.
C
Conduct manual data validation by comparing outputs from the updated pipeline against expected results for a sample of test data.
D
Leverage Databricks MLflow to track experiment runs with the new pipeline version, using statistical analysis to ensure data quality metrics meet predefined thresholds.