Ultimate access to all questions.
The data engineering team maintains a bronze table named 'orders_raw' where new orders data is appended nightly. They aim to create a Silver table, 'orders_cleaned', for a refined view of the orders data. What is the most cost-effective batch processing solution to propagate new records from 'orders_raw' to 'orders_cleaned'?
Explanation:
Databricks supports the trigger(availableNow=True)
option for Delta Lake and Auto Loader sources, enabling the consumption of all available records in an incremental batch efficiently. While trigger(once=True)
was an option for incremental batch processing, it's now deprecated in newer Databricks Runtime versions. For future incremental batch processing workloads, Databricks recommends using trigger(availableNow=True)
. Reference: Databricks Documentation.