Databricks Certified Machine Learning - Associate

Ultimate access to all questions.

In a distributed computing environment, a data engineer is setting up a streaming data pipeline with Apache Spark Structured Streaming. This pipeline features a stateful operation designed to monitor the running count of events by key. Which configuration is crucial for achieving the best performance and ensuring fault tolerance for this stateful operation?

Real Exam

Broadcasting join tables to all executors to reduce shuffle during state updates.

19.4%

Using a stateful operation that stores state in local memory for faster access.

19.4%

Loading comments...