
Answer-first summary for fast verification
Answer: Use Spark Structured Streaming to process new records from 'orders_raw' in batch mode with the trigger availableNow option.
Databricks supports the `trigger(availableNow=True)` option for Delta Lake and Auto Loader sources, enabling the consumption of all available records in an incremental batch efficiently. While `trigger(once=True)` was an option for incremental batch processing, it's now deprecated in newer Databricks Runtime versions. For future incremental batch processing workloads, Databricks recommends using `trigger(availableNow=True)`. Reference: [Databricks Documentation](https://docs.databricks.com/structured-streaming/triggers.html#configuring-incremental-batch-processing).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
The data engineering team maintains a bronze table named 'orders_raw' where new orders data is appended nightly. They aim to create a Silver table, 'orders_cleaned', for a refined view of the orders data. What is the most cost-effective batch processing solution to propagate new records from 'orders_raw' to 'orders_cleaned'?
A
Utilize Delta Lake's time travel feature to compare the latest version of 'orders_raw' with its previous version, then write the differences to 'orders_cleaned'.
B
Implement Spark Structured Streaming's foreachBatch logic with a 24-hour processing time trigger to process new records from 'orders_raw'.
C
Reprocess all records in 'orders_raw' using batch overwrite logic to update 'orders_cleaned'.
D
Apply an insert-only merge into 'orders_cleaned' using data from 'orders_raw' based on a composite key.
E
Use Spark Structured Streaming to process new records from 'orders_raw' in batch mode with the trigger availableNow option.
No comments yet.