
Answer-first summary for fast verification
Answer: Relying on Databricks‘ built-in job scheduling features to manage intra-Databricks dependencies and using external orchestrators like Apache Airflow for cross-system.
**Option D** is the most efficient approach because it combines the strengths of **Databricks Jobs** and **external orchestrators like Apache Airflow**. - **Databricks Jobs** are excellent for managing **intra-Databricks dependencies** (task graphs, retries, cluster reuse), which helps reduce idle clusters and resource waste. - **External orchestrators (e.g., Airflow)** are better suited for **cross-system orchestration**, such as coordinating Databricks with cloud storage, streaming systems, or downstream applications. - This hybrid model avoids overengineering (Option A), eliminates manual intervention (Option C), and prevents unnecessary complexity by not pushing all logic outside Databricks (Option B). ### ❌ Why Other Options Are Less Effective - **A**: Custom frameworks increase maintenance overhead and are error-prone. - **B**: Airflow alone ignores Databricks’ native optimizations. - **C**: Manual triggering does not scale and wastes resources.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a complex data pipeline with multiple interdependent tasks that span multiple systems, which approach is most efficient for managing these dependencies to minimize idle time and resource wastage?
A
Implementing a custom dependency resolution framework within Databricks notebooks.
B
Utilizing external orchestration tools like Apache Airflow to define and manage task dependencies outside of Databricks.
C
Manual oversight of task executions to trigger dependent tasks upon completions.
D
Relying on Databricks‘ built-in job scheduling features to manage intra-Databricks dependencies and using external orchestrators like Apache Airflow for cross-system.
No comments yet.