
Explanation:
To ensure the reliability and correctness of Structured Streaming jobs in production, it's recommended to configure them with Unlimited Retries and 1 Maximum Concurrent Run. This setup allows the query to restart on failure without overlapping runs, ensuring data consistency. Additionally, always use a new job cluster with the latest Spark version (or at least version 2.1) for recoverability after upgrades. Avoid setting a schedule or timeout, as streaming queries are designed to run indefinitely. Notifications can be set for failure alerts. Reference
Ultimate access to all questions.
No comments yet.
What is the recommended retry policy for production Structured Streaming jobs to ensure reliability and correctness?
A
No Retries, with Unlimited Concurrent Runs
B
Unlimited Retries, with 1 Maximum Concurrent Run
C
1 Retry, with 1 Maximum Concurrent Run
D
No Retries, with 1 Maximum Concurrent Run
E
Unlimited Retries, with Unlimited Concurrent Runs