Ultimate access to all questions.
You are tasked with optimizing a Spark job for incremental processing on a large dataset in Azure Databricks. The scenario involves a new dataset that contains updates to a subset of an existing dataset. Your goal is to ensure efficient processing while minimizing data redundancy and cost. Consider the following constraints: the existing dataset is very large, and the new dataset is significantly smaller but contains critical updates. Additionally, the solution must comply with data governance policies that require minimal data movement. Given these constraints, which of the following approaches is the BEST to achieve efficient incremental processing? (Choose one option.)