
Answer-first summary for fast verification
Answer: Employ Databricks' integrated support for Delta and Structured Streaming to concurrently process the CSV file and real-time streaming data, leveraging built-in optimizations for performance and scalability.
Option D is the correct choice because it utilizes Databricks' native capabilities to efficiently handle both batch and streaming data. Delta provides optimized performance for batch processing of large datasets, while Structured Streaming is designed for real-time data processing. This approach minimizes manual intervention and leverages the platform's built-in optimizations for scalability and performance. Options A, B, and C either introduce unnecessary complexity, risk inefficiency, or fail to fully utilize Databricks' integrated features.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a Databricks environment, you are tasked with processing and integrating data from two distinct sources: a large CSV file stored in an object storage service and a real-time streaming source. The project requires high efficiency, scalability, and the ability to handle the nuances of both batch and streaming data processing. Considering the need for optimal performance and minimal manual intervention, which approach should you choose? (Choose one correct answer from the options below.)
A
Utilize Databricks' file I/O capabilities for the CSV file and set up a separate cluster for the streaming data, ensuring isolation but potentially increasing complexity and cost.
B
Import the entire CSV file into a Databricks Delta table and attempt to use it as the sole source for processing both batch and streaming data, risking inefficiency and scalability issues.
C
Develop a custom solution to pre-merge the CSV and streaming data before processing, which may introduce delays and require significant maintenance.
D
Employ Databricks' integrated support for Delta and Structured Streaming to concurrently process the CSV file and real-time streaming data, leveraging built-in optimizations for performance and scalability.
No comments yet.