
Ultimate access to all questions.
A Databricks data engineering team is designing a multi-task job to process customer orders. The pipeline includes two tasks:
OrderValidation: Checks each order for completeness and correctness, marking invalid orders as such.
OrderFulfillment: Processes and ships only valid orders.
The team configures the job so that OrderFulfillment only starts after OrderValidation has completed successfully. Why is this task dependency essential in the pipeline design?
A
To guarantee that orders are processed in the same order they were received.
B
To avoid simultaneous execution of tasks that could overload cluster resources.
C
To ensure that only orders passing validation are sent for fulfillment, preventing invalid orders from being processed.
D
To minimize the overall runtime of the pipeline by running tasks sequentially.
Explanation:
Configuring OrderFulfillment to depend on the successful completion of OrderValidation ensures that only validated, correct orders are processed for fulfillment. This prevents invalid or incomplete orders from being shipped, which is critical for maintaining data quality and business integrity.