
Answer-first summary for fast verification
Answer: Switch to a columnar storage format like Apache Parquet for its efficient compression and encoding schemes, which reduce serialization overhead.
Switching to a columnar storage format like Apache Parquet is the best option because it provides efficient compression and encoding schemes, which significantly reduce serialization overhead. This approach meets the platform's requirements for high throughput and low latency, is cost-efficient by optimizing storage and processing, complies with financial data regulations through secure and reliable data handling, and scales well with the volume of data. Option A, while potentially reducing overhead, may not be as efficient or scalable as using a widely adopted format like Parquet. Option B addresses latency but at a higher cost and does not directly reduce serialization overhead. Option D may improve performance but at the expense of potentially losing valuable data, which is not ideal for a financial analytics platform.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are designing a Spark Structured Streaming application to process a large dataset in real-time for a financial analytics platform. The platform requires high throughput and low latency to handle millions of transactions per second. During the initial testing phase, you notice that the performance of your queries is significantly impacted by data serialization overhead. Considering the need for cost efficiency, compliance with financial data regulations, and scalability, which of the following optimizations would BEST address the serialization issue while meeting all the platform's requirements? (Choose one option.)
A
Implement a custom serialization library tailored specifically for financial data to minimize overhead.
B
Increase the cluster size to distribute the serialization workload across more nodes, thereby reducing latency.
C
Switch to a columnar storage format like Apache Parquet for its efficient compression and encoding schemes, which reduce serialization overhead.
D
Reduce the volume of data being processed by filtering out non-essential transactions before serialization.