Reddit

The data engineering team is working with a large Delta Lake table named 'user_posts', partitioned by the 'year' column. This table serves as a streaming source for a job. The streaming query is partially shown below, with a blank to fill in:

    .table("user_posts")
    ________________
    .groupBy("post_category", "post_date")
    .agg(
        count("psot_id").alias("posts_count"),
        sum("likes").alias("total_likes")
    )
    .writeStream
    .option("checkpointLocation", "dbfs:/path/checkpoint")
    .table("psots_stats")

    .table("user_posts")
    ________________
    .groupBy("post_category", "post_date")
    .agg(
        count("psot_id").alias("posts_count"),
        sum("likes").alias("total_likes")
    )
    .writeStream
    .option("checkpointLocation", "dbfs:/path/checkpoint")
    .table("psots_stats")

The team aims to delete data from the previous 2 years without violating the append-only requirement of streaming sources. Which option correctly fills the blank to ensure the table remains streamable after partition deletion?

Real Exam

.withWatermark("year", "INTERVAL 2 YEARS")

16.0%

.window("year", "INTERVAL 2 YEARS")

13.6%

.option("year", "ignoreDeletes")

8.3%

.option("ignoreDeletes", "year")

10.2%

.option("ignoreDeletes", True)

51.9%

Databricks Certified Data Engineer - Professional

Get started today

Comments