
Answer-first summary for fast verification
Answer: Use Delta Lake for storage, enabling time-travel queries for accessing historical data.
Delta Lake is an open-source storage layer that enhances Apache Spark and big data workloads with ACID transactions. It offers schema enforcement, data versioning, and time travel capabilities, making it perfect for handling long-term historical data in streaming applications. Time-travel queries allow accessing data as it was at any point in time, facilitating the analysis of historical data alongside real-time streams. Delta Lake simplifies the management and querying of vast amounts of historical data, eliminating the need for separate storage in traditional data warehouses or custom solutions like sliding windows. It also supports data compaction and optimization, ensuring efficient storage and query performance over time. While archiving data to cloud storage and using external table references is possible, Delta Lake provides a more streamlined and integrated approach for structured streaming environments.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a streaming application that needs to analyze trends over several years, how can you effectively manage and query large volumes of historical data alongside real-time streams?
A
Store historical data in a traditional data warehouse and query it separately from streaming data.
B
Implement a custom sliding window mechanism over a Parquet-based data lake.
C
Use Delta Lake for storage, enabling time-travel queries for accessing historical data.
D
Archive older data to cloud storage and utilize external table references for querying.
No comments yet.