Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
In a Spark Structured Streaming application with stateful operations, what is the optimal strategy for ensuring efficient fault tolerance through checkpointing while minimizing performance overhead?
A
Setting the checkpoint interval to a very high value to reduce the frequency of checkpoint writes, trading off recovery time for performance.
B
Configuring checkpointing to HDFS or a cloud-based storage system, carefully selecting the checkpoint interval to balance performance and recovery needs.
C
Disabling checkpointing and relying on Spark‘s in-memory state management to recover from failures.
D
Using local file storage for checkpoint data to speed up read/write operations, despite potential data loss risks.