
Answer-first summary for fast verification
Answer: ```python preds.write.mode("append").saveAsTable("churn_preds") ```
### Correct Choice **B. preds.write.mode("append").saveAsTable("churn_preds")** ### Why it's the best fit: 1. **Cost-Efficiency:** For data that is processed only once per day, a batch write is significantly cheaper than Structured Streaming. A batch job performs the write and then allows the compute resources to shut down immediately, whereas a continuous stream (even if idle) or the overhead of managing streaming checkpoints for daily data is unnecessary. 2. **Historical Integrity:** The `append` mode ensures that each day's predictions are added as new rows to the existing Delta table. This preserves historical data, enabling comparisons across time. 3. **Managed Delta Lake Table:** On Databricks, `saveAsTable` defaults to the Delta format. This provides ACID transactions and time-travel capabilities out of the box. ### Analysis of Other Options: * **Option C:** The default write mode for Spark is `errorIfExists`. This snippet would fail on the second day because the path already exists. * **Option E:** Using `overwrite` replaces the entire dataset every day, making it impossible to perform the historical comparisons requested by the team. * **Options A & D:** Using Structured Streaming for data that arrives once a day is inefficient. Even if using `Trigger.AvailableNow`, the simplicity and lower overhead of a batch `append` make it the preferred choice for this specific frequency. Additionally, Option A uses `overwrite`, which destroys the history the team requires.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data science team needs to store daily churn predictions generated by a production MLflow model in a Delta Lake table. The solution must support historical analysis, allowing data scientists to compare predictions over time. Churn predictions are generated at most once per day. Which code snippet achieves this requirement with the lowest compute overhead and cost?
A
(preds.writeStream
.outputMode("overwrite")
.option("checkpointPath", "/_checkpoints/churn_preds")
.start("/preds/churn_preds"))
(preds.writeStream
.outputMode("overwrite")
.option("checkpointPath", "/_checkpoints/churn_preds")
.start("/preds/churn_preds"))
B
preds.write.mode("append").saveAsTable("churn_preds")
preds.write.mode("append").saveAsTable("churn_preds")
C
preds.write.format("delta").save("/preds/churn_preds")
preds.write.format("delta").save("/preds/churn_preds")
D
(preds.writeStream
.outputMode("append")
.option("checkpointPath", "/_checkpoints/churn_preds")
.table("churn_preds"))
(preds.writeStream
.outputMode("append")
.option("checkpointPath", "/_checkpoints/churn_preds")
.table("churn_preds"))
E
(preds.write
.format("delta")
.mode("overwrite")
.saveAsTable("churn_preds"))
(preds.write
.format("delta")
.mode("overwrite")
.saveAsTable("churn_preds"))