
Ultimate access to all questions.
In a scenario where you are tasked with performing complex data transformations on a large dataset stored in Delta Lake within an Azure Databricks environment, you need to ensure the solution is efficient, scalable, and maintainable. The solution must also adhere to cost constraints and comply with organizational data governance policies. Considering these requirements, which of the following approaches would you choose to implement? (Choose one)
A
Develop a single, comprehensive script that handles all data transformations in one execution, minimizing the number of jobs submitted to the Azure Databricks cluster to reduce costs.
B
Design the data transformations as a series of smaller, modular functions that can be independently tested and reused, leveraging Apache Spark's distributed processing capabilities for efficiency and scalability.
C
Manually process the dataset row by row using a custom script to ensure precise control over each transformation step, despite the potential impact on processing time and scalability.
D
Utilize a mix of programming languages and tools for different transformation steps, selecting each based on its specific strengths for the task at hand, to optimize performance and flexibility.