
Ultimate access to all questions.
Given the following streaming query in Databricks, which statement best describes its functionality?
spark.readStream \
.table("orders_cleaned") \
.withWatermark("order_timestamp", "10 minutes") \
.groupBy(
window("order_timestamp", "5 minutes").alias("time"),
"author"
) \
.agg(
count("order_id").alias("orders_count"),
avg("quantity").alias("avg_quantity")
) \
.writeStream \
.option("checkpointLocation", "dbfs:/path/checkpoint") \
.table("orders_stats")
spark.readStream \
.table("orders_cleaned") \
.withWatermark("order_timestamp", "10 minutes") \
.groupBy(
window("order_timestamp", "5 minutes").alias("time"),
"author"
) \
.agg(
count("order_id").alias("orders_count"),
avg("quantity").alias("avg_quantity")
) \
.writeStream \
.option("checkpointLocation", "dbfs:/path/checkpoint") \
.table("orders_stats")
_
A
It computes business-level aggregates for each non-overlapping ten-minute interval, maintaining incremental state information for 5 minutes to accommodate late-arriving data.
B
It computes business-level aggregates for each overlapping five-minute interval, maintaining incremental state information for 10 minutes to accommodate late-arriving data.
C
It computes business-level aggregates for each non-overlapping five-minute interval, maintaining incremental state information for 10 minutes to accommodate late-arriving data.
D
It computes business-level aggregates for each overlapping ten-minute interval, maintaining incremental state information for 5 minutes to accommodate late-arriving data.
E
None of the above statements accurately describe this query.