
Ultimate access to all questions.
The data engineering team is working with a large Delta Lake table named 'user_posts', partitioned by the 'year' column. This table serves as a streaming source for a job. The streaming query is partially shown below, with a blank to fill in:
.table("user_posts")
________________
.groupBy("post_category", "post_date")
.agg(
count("psot_id").alias("posts_count"),
sum("likes").alias("total_likes")
)
.writeStream
.option("checkpointLocation", "dbfs:/path/checkpoint")
.table("psots_stats")
.table("user_posts")
________________
.groupBy("post_category", "post_date")
.agg(
count("psot_id").alias("posts_count"),
sum("likes").alias("total_likes")
)
.writeStream
.option("checkpointLocation", "dbfs:/path/checkpoint")
.table("psots_stats")
The team aims to delete data from the previous 2 years without violating the append-only requirement of streaming sources. Which option correctly fills the blank to ensure the table remains streamable after partition deletion?
A
.withWatermark("year", "INTERVAL 2 YEARS")
B
.window("year", "INTERVAL 2 YEARS")
C
.option("year", "ignoreDeletes")
D
.option("ignoreDeletes", "year")
E
.option("ignoreDeletes", True)