
Ultimate access to all questions.
You are designing a Spark Structured Streaming application to process a large dataset in real-time for a financial analytics platform. The platform requires high throughput and low latency to handle millions of transactions per second. During the initial testing phase, you notice that the performance of your queries is significantly impacted by data serialization overhead. Considering the need for cost efficiency, compliance with financial data regulations, and scalability, which of the following optimizations would BEST address the serialization issue while meeting all the platform's requirements? (Choose one option.)
A
Implement a custom serialization library tailored specifically for financial data to minimize overhead.
B
Increase the cluster size to distribute the serialization workload across more nodes, thereby reducing latency.
C
Switch to a columnar storage format like Apache Parquet for its efficient compression and encoding schemes, which reduce serialization overhead.
D
Reduce the volume of data being processed by filtering out non-essential transactions before serialization.