Ultimate access to all questions.
When deploying Structured Streaming jobs in production, which configuration enables automatic recovery from query failures while maintaining cost efficiency?
Explanation:
The correct configuration for scheduling Structured Streaming jobs in production to ensure automatic recovery from query failures while keeping costs low involves using a New Job Cluster with Unlimited Retries and a Maximum Concurrent Runs set to 1. This setup allows for cost efficiency by terminating the cluster after job completion, ensures automatic recovery from failures with unlimited retries, and prevents data duplication or conflicts by limiting concurrent runs to 1. Options A and B are incorrect because A allows for unlimited concurrent runs which can lead to issues, and B does not allow for any retries, eliminating the possibility of automatic recovery. Options C and E are incorrect because they suggest using an Existing All-Purpose Cluster, which is more costly and not optimized for job-specific workloads.