
Answer-first summary for fast verification
Answer: Configure Spark Structured Streaming to read data, handle schema drift with schema evolution, and use repartitioning to manage data across partitions.
Option B is correct because it involves configuring Spark Structured Streaming to read data, handling schema drift with schema evolution, and using repartitioning to manage data across partitions, which ensures efficient processing and adaptability to schema changes.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In a scenario where you are processing event data from a gaming platform using Spark Structured Streaming, how would you design your Spark job to handle schema drift and process time series data efficiently? Additionally, describe how you would manage data across partitions and within one partition to ensure optimal performance.
A
Use Spark Structured Streaming to read data, ignore schema drift, and process data only within one partition.
B
Configure Spark Structured Streaming to read data, handle schema drift with schema evolution, and use repartitioning to manage data across partitions.
C
Set up Spark Structured Streaming to read data, ignore partitions, and use a fixed schema without handling changes.
D
Use Spark Structured Streaming to read data, focus only on schema drift, and ignore data partitioning.