
Ultimate access to all questions.
In a Databricks environment, you are tasked with processing and integrating data from two distinct sources: a large CSV file stored in an object storage service and a real-time streaming source. The project requires high efficiency, scalability, and the ability to handle the nuances of both batch and streaming data processing. Considering the need for optimal performance and minimal manual intervention, which approach should you choose? (Choose one correct answer from the options below.)
A
Utilize Databricks' file I/O capabilities for the CSV file and set up a separate cluster for the streaming data, ensuring isolation but potentially increasing complexity and cost.
B
Import the entire CSV file into a Databricks Delta table and attempt to use it as the sole source for processing both batch and streaming data, risking inefficiency and scalability issues.
C
Develop a custom solution to pre-merge the CSV and streaming data before processing, which may introduce delays and require significant maintenance.
D
Employ Databricks' integrated support for Delta and Structured Streaming to concurrently process the CSV file and real-time streaming data, leveraging built-in optimizations for performance and scalability.