Ultimate access to all questions.
You are developing a PySpark application to process a large dataset stored in a CSV file. The dataset contains millions of records, and you need to perform several transformations. Your goal is to optimize the job's performance while considering cost, compliance, and scalability. Which of the following strategies would you choose and why? (Choose one option.)