Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

Explanation:

To ensure the reliability and correctness of Structured Streaming jobs in production, it's recommended to configure them with Unlimited Retries and 1 Maximum Concurrent Run. This setup allows the query to restart on failure without overlapping runs, ensuring data consistency. Additionally, always use a new job cluster with the latest Spark version (or at least version 2.1) for recoverability after upgrades. Avoid setting a schedule or timeout, as streaming queries are designed to run indefinitely. Notifications can be set for failure alerts. Reference

Explanation:

To ensure the reliability and correctness of Structured Streaming jobs in production, it's recommended to configure them with Unlimited Retries and 1 Maximum Concurrent Run. This setup allows the query to restart on failure without overlapping runs, ensuring data consistency. Additionally, always use a new job cluster with the latest Spark version (or at least version 2.1) for recoverability after upgrades. Avoid setting a schedule or timeout, as streaming queries are designed to run indefinitely. Notifications can be set for failure alerts. Reference

Comments (0)

No comments yet.

Get started today

Ultimate access to all questions.

Comments (0)

No comments yet.

What is the recommended retry policy for production Structured Streaming jobs to ensure reliability and correctness?

Real Exam

0

A

No Retries, with Unlimited Concurrent Runs

4.9%

B

Unlimited Retries, with 1 Maximum Concurrent Run

66.2%

C

1 Retry, with 1 Maximum Concurrent Run

15.2%

D

No Retries, with 1 Maximum Concurrent Run

8.8%

E

Unlimited Retries, with Unlimited Concurrent Runs

4.9%

Powered ByGPT 5.4 powered