
Ultimate access to all questions.
In a scenario where you are required to process data from multiple sources, perform complex transformations, and load the results into a data warehouse using Databricks, which of the following approaches would be the MOST efficient and scalable? Consider factors such as modularity, error handling, maintainability, and the ability to handle dependencies between tasks. Choose the best option from the four provided.
A
Implement a single task within the job that sequentially reads data from all sources, performs all transformations in one go, and then loads the results into the data warehouse.
B
Design a multi-task job where each task is dedicated to reading data from a specific source, performing the necessary transformations, and then passing the processed data to the next task in sequence.
C
Create a multi-task job with each task independently reading from a specific source, performing transformations, and writing the results to a separate data warehouse, without any task dependencies.
D
Structure a multi-task job with a task for each data source, where each task writes the transformed data to a common data warehouse, and configure the tasks to run in parallel where possible to optimize performance.