
Answer-first summary for fast verification
Answer: They can utilize multiple tasks in a single job with a linear dependency
## Explanation The correct answer is **A** because: - **Multiple tasks in a single job with linear dependency** ensures that the second task only starts after the first task successfully completes. This eliminates the timing issue where the second job starts before the first job finishes. - **Option B (cluster pools)** might improve efficiency but doesn't solve the dependency timing problem. - **Option C (retry policy)** might help with transient failures but doesn't guarantee the first job completes before the second job starts. - **Option D (limit output size)** addresses potential performance issues but doesn't solve the fundamental dependency timing problem. - **Option E (streaming data)** is not appropriate for batch jobs and doesn't address the job dependency issue. By using multiple tasks within a single job with linear dependencies, the data engineer can ensure proper execution order without relying on fixed start times that may cause race conditions.
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
Question 38
A data engineer has set up two Jobs that each run nightly. The first Job starts at 12:00 AM, and it usually completes in about 20 minutes. The second Job depends on the first Job, and it starts at 12:30 AM. Sometimes, the second Job fails when the first Job does not complete by 12:30 AM.
Which of the following approaches can the data engineer use to avoid this problem?
A
They can utilize multiple tasks in a single job with a linear dependency
B
They can use cluster pools to help the Jobs run more efficiently
C
They can set up a retry policy on the first Job to help it run more quickly
D
They can limit the size of the output in the second Job so that it will not fail as easily
E
They can set up the data to stream from the first Job to the second Job