
Answer-first summary for fast verification
Answer: Schedule a job to run the pipeline once per hour using a new, automated job cluster.
### Explanation **Correct Answer: C** Automated **job clusters** (also known as ephemeral clusters) are the most cost-efficient choice for periodic batch processing in Databricks. Because job clusters are provisioned only when the job starts and are terminated immediately upon completion, you only pay for the ~10 minutes of compute time (plus a small overhead for cluster startup) every hour. This prevents paying for idle compute time. **Analysis of Other Options:** * **Structured Streaming (Trigger 60 min):** A streaming job typically requires a cluster to remain active 24/7. Even with a long trigger interval, you would incur costs for the cluster while it sits idle between micro-batches. * **Interactive Clusters:** These are intended for ad-hoc analysis and are priced at a higher rate than job clusters. Using a dedicated interactive cluster for an hourly job results in 50 minutes of wasted idle time every hour at a premium price. * **Manual Triggers:** This is not a scalable or reliable way to meet a production SLA, as it introduces human error and lacks automation. * **Event-driven (File Arrival):** This does not guarantee an hourly cadence. If files arrive irregularly (e.g., every 5 minutes or every 2 hours), the dashboard will either update too frequently (increasing costs) or too slowly (violating the SLA).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A business reporting team requires their dashboards to be updated every hour. Given that the data pipeline takes approximately 10 minutes to process, which configuration provides the most cost-effective solution to satisfy this Service Level Agreement (SLA) under standard operating conditions?
A
Schedule a Structured Streaming job with a trigger interval of 60 minutes.
B
Manually trigger the pipeline execution whenever the reporting team refreshes their dashboard.
C
Schedule a job to run the pipeline once per hour using a new, automated job cluster.
D
Schedule a job to run the pipeline once every hour on a dedicated, long-running interactive cluster.
E
Configure an event-driven job to execute whenever new data files are uploaded to a specific storage directory.