Ultimate access to all questions.
You are tasked with optimizing the performance of a Spark Structured Streaming job that processes a large dataset. The job is experiencing performance degradation due to the presence of numerous small files ('smalls'). Considering the constraints of cost efficiency, compliance with data governance policies, and the need for scalability, which of the following solutions is the BEST to address this issue? Choose one option.