Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

In the context of Spark Structured Streaming, you are tasked with implementing a solution that requires maintaining and updating the state of data over time to perform time-based aggregations, such as windowed averages or counts. The solution must be scalable, handle large volumes of data efficiently, and comply with the organization's cost constraints. Which of the following options BEST describes the correct approach to implement stateful processing in Spark Structured Streaming for this scenario? Choose the single best option.

Simulated

Stateful processing is not suitable for time-based aggregations in Spark Structured Streaming.

9.6%

Stateful processing can be implemented using the 'mapGroupsWithState' function, which allows for custom state management and is ideal for complex aggregations over time.

Comments

Loading comments...

Stateful processing can only be achieved by using external databases to maintain the state, as Spark Structured Streaming does not support stateful operations internally.

23.5%

Stateful processing can be implemented using the 'updateStateByKey' function, which is simpler but less flexible than 'mapGroupsWithState' for time-based aggregations.

10.7%