
Answer-first summary for fast verification
Answer: Stream data from multiplex bronze tables in real-time, applying data transformations and cleaning on-the-fly to ensure immediate data availability and quality.
The best approach to meet the organization's requirements is to stream data in real-time with on-the-fly transformations and cleaning (Option C). This method ensures immediate data availability for decision-making while maintaining high data quality. Real-time processing with immediate transformations and cleaning addresses the varying quality levels of source data without introducing significant latency. Utilizing technologies like Apache Kafka for streaming and Delta Lake for data management can provide a scalable and cost-effective solution. This approach contrasts with batch processing (Options B and D), which introduces latency, and with real-time streaming without transformations (Option A), which compromises data quality.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the context of streaming data from multiplex bronze tables in a Databricks environment, consider the following scenario: Your organization requires real-time data processing to support immediate decision-making. The data comes from various sources with varying quality levels, and there's a need to ensure high data quality without introducing significant latency. Additionally, the solution must be cost-effective and scalable to handle increasing data volumes. Which of the following approaches BEST meets these requirements? Choose one option.
A
Stream data from multiplex bronze tables in real-time, without any data transformations or cleaning, to minimize latency and costs.
B
Stream data from multiplex bronze tables in batches, applying data transformations and cleaning after each batch to ensure data quality at the expense of latency.
C
Stream data from multiplex bronze tables in real-time, applying data transformations and cleaning on-the-fly to ensure immediate data availability and quality.
D
Stream data from multiplex bronze tables in batches, without any data transformations or cleaning, to balance between latency and costs.