
Answer-first summary for fast verification
Answer: Use Azure Event Hubs as a front buffer to ingest data streams, then process and store the data in Delta Lake using Databricks Structured Streaming.
Using Azure Event Hubs as a front buffer decouples the ingestion layer from the processing layer, efficiently handling high-velocity and high-volume data. Event Hubs can manage millions of events per second, offering a scalable solution for IoT data ingestion. Processing and storing data in Delta Lake with Databricks Structured Streaming ensures reliability and scalability, thanks to Delta Lake's ACID transactions, schema enforcement, and data versioning. Structured Streaming enables continuous data stream processing for real-time analytics. This approach also facilitates integration with other Databricks features for advanced analysis. Alternatives like deploying individual jobs per device (option A) may lead to inefficiency and resource contention, while batch ingestion (option B) introduces latency unsuitable for real-time needs. Direct streaming without processing (option C) lacks necessary buffering and processing capabilities for high-volume data.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You're tasked with setting up a real-time data ingestion pipeline into Databricks from IoT devices, requiring efficient handling of high-velocity and high-volume data. Which approach ensures scalability and reliability of the data ingestion layer?
A
Deploy individual Databricks jobs for each IoT device to ensure dedicated resources for ingestion.
B
Batch ingest data at hourly intervals to reduce the load on the Databricks clusters.
C
Directly stream IoT data into Delta tables without any intermediate processing to minimize latency.
D
Use Azure Event Hubs as a front buffer to ingest data streams, then process and store the data in Delta Lake using Databricks Structured Streaming.
No comments yet.