Ultimate access to all questions.
You are designing a data processing pipeline in Azure Databricks to handle large-scale data analytics. The pipeline must efficiently process data with varying sizes, from gigabytes to terabytes, while optimizing for both cost and performance. The solution must also ensure that the data processing can scale dynamically based on the workload without manual intervention. Considering these requirements, which of the following strategies is the BEST to implement for ensuring data scalability and performance optimization in Spark? Choose the single best option.