
Answer-first summary for fast verification
Answer: Schedule a job to execute the pipeline once an hour on a new job cluster
The business requirement is hourly updates with the lowest cost. The pipeline takes 10 minutes to process. Option B schedules a job hourly on a new job cluster, which spins up resources only when needed, minimizing costs by terminating the cluster after each run. Option C (Streaming) and D (dedicated cluster) involve long-running clusters, increasing costs. Option A may trigger jobs more frequently than needed if data arrives multiple times per hour. Thus, B is the most cost-effective while meeting the SLA.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
What is the most cost-effective configuration that will satisfy the business reporting team's requirement for hourly dashboard data updates, given that the ETL pipeline processing time is 10 minutes under normal operating conditions?
A
Configure a job that executes every time new data lands in a given directory
B
Schedule a job to execute the pipeline once an hour on a new job cluster
C
Schedule a Structured Streaming job with a trigger interval of 60 minutes
D
Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster