
Answer-first summary for fast verification
Answer: `.option("ignoreDeletes", True)`
Partitioning on datetime columns, such as 'year', allows for the efficient deletion of data older than a specified age. However, deleting partitions from a table used as a streaming source violates the append-only requirement, rendering the table non-streamable. The `ignoreDeletes` option, when set to `True`, permits streaming from Delta tables even after partitions have been deleted, ensuring the table remains a valid streaming source. This approach maintains data integrity and supports continuous processing. Reference: [Delta Lake Documentation on Ignoring Updates and Deletes](https://docs.databricks.com/structured-streaming/delta-lake.html#ignore-updates-and-deletes)
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
The data engineering team is working with a large Delta Lake table named 'user_posts', partitioned by the 'year' column. This table serves as a streaming source for a job. The streaming query is partially shown below, with a blank to fill in:
.table("user_posts")
________________
.groupBy("post_category", "post_date")
.agg(
count("psot_id").alias("posts_count"),
sum("likes").alias("total_likes")
)
.writeStream
.option("checkpointLocation", "dbfs:/path/checkpoint")
.table("psots_stats")
.table("user_posts")
________________
.groupBy("post_category", "post_date")
.agg(
count("psot_id").alias("posts_count"),
sum("likes").alias("total_likes")
)
.writeStream
.option("checkpointLocation", "dbfs:/path/checkpoint")
.table("psots_stats")
The team aims to delete data from the previous 2 years without violating the append-only requirement of streaming sources. Which option correctly fills the blank to ensure the table remains streamable after partition deletion?
A
.withWatermark("year", "INTERVAL 2 YEARS")
B
.window("year", "INTERVAL 2 YEARS")
C
.option("year", "ignoreDeletes")
D
.option("ignoreDeletes", "year")
E
.option("ignoreDeletes", True)