
Ultimate access to all questions.
A data engineer has set up two Jobs that each run nightly. The first Job starts at 12:00 AM, and it usually completes in about 20 minutes. The second Job depends on the first Job, and it starts at 12:30 AM. Sometimes, the second Job fails when the first Job does not complete by 12:30 AM.
Which of the following approaches can the data engineer use to avoid this problem?
A
They can utilize multiple tasks in a single job with a linear dependency
B
They can use cluster pools to help the Jobs run more efficiently
C
They can set up a retry policy on the first Job to help it run more quickly
D
They can limit the size of the output in the second Job so that it will not fail as easily
E
They can set up the data to stream from the first Job to the second Job
Explanation:
The correct answer is A. They can utilize multiple tasks in a single job with a linear dependency.
In Databricks, when you have jobs with dependencies, it's best to:
depends_on parameter