
Ultimate access to all questions.
In a distributed computing environment, a data engineer is setting up a streaming data pipeline with Apache Spark Structured Streaming. This pipeline features a stateful operation designed to monitor the running count of events by key. Which configuration is crucial for achieving the best performance and ensuring fault tolerance for this stateful operation?
A
Broadcasting join tables to all executors to reduce shuffle during state updates.
B
Using a stateful operation that stores state in local memory for faster access.
C
Configuring checkpointing to HDFS to ensure fault tolerance.
D
Disabling write-ahead logs to increase the throughput of the streaming application.
E
Increasing the number of shuffle partitions to maximize parallelism.