
Ultimate access to all questions.
You are working on a data transformation project using Apache Spark in Azure Databricks. The project involves joining two large datasets, 'Sales' and 'Products', on the 'ProductID' field and then performing a series of transformations, including filtering, aggregation, and sorting. Which Spark API would you use to achieve this, and how would you optimize the performance of the transformations?
A
Use RDDs for all transformations and optimize performance by increasing the number of partitions.
B
Use DataFrames for all transformations and optimize performance by caching frequently accessed data and using appropriate join strategies.
C
Use Datasets for all transformations and optimize performance by reducing the number of partitions.
D
Use RDDs for all transformations and optimize performance by reducing the number of partitions.