Ultimate access to all questions.
In a real-time data processing scenario using Spark Structured Streaming, you are tasked with implementing a deduplication mechanism to ensure that each record in your stream is unique based on a specific key. The solution must efficiently handle late-arriving data and scale to process millions of records per second. Considering the constraints of cost, compliance, and scalability, which of the following approaches is the BEST for achieving deduplication in this context? Choose the single best option.