
Answer-first summary for fast verification
Answer: Create a notebook parameter for batch date, assign its value to a Python variable, and use a Spark DataFrame to filter the data based on this variable.
✅ **B. Create a notebook parameter for batch date, assign its value to a Python variable, and use a Spark DataFrame to filter the data based on this variable.** Databricks notebooks support notebook parameters, which can be set during job runs or interactively. By defining `batch_date` as a notebook parameter, you can pass different values each time the program runs. This value can be accessed within the notebook using Databricks utilities (`dbutils.widgets.get("batch_date")`), assigned to a Python variable, and used in a Spark DataFrame's where clause for filtering. This method is flexible, clean, and integrates well with Databricks job scheduling. ❌ **A. Store the batch date in the Spark configuration and use a Spark DataFrame to filter the data based on the Spark configuration.** While possible, using Spark configuration for a dynamically changing batch date is less straightforward and conventional than using notebook parameters, adding unnecessary complexity. ❌ **C. Manually edit the code every time to change the batch date.** This approach is inefficient, error-prone, and contradicts the requirement of avoiding manual changes. ❌ **D. Create a dynamic view that automatically calculates the batch date and use this view to query the data.** While views can simplify querying, this option doesn't directly address the need for a parameterized batch date that changes with each run. The logic for determining the batch date would still need to be defined, potentially leading back to a solution similar to option B. ❌ **E. There is no way to combine a Python variable and Spark code for filtering.** This is incorrect. Databricks notebooks seamlessly integrate Python and Spark, allowing Python variables to be used within Spark code, including DataFrame filtering operations.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
How can you parameterize a query to filter data based on a batch date that changes with each run, without manually altering the code each time?
A
Store the batch date in the Spark configuration and use a Spark DataFrame to filter the data based on the Spark configuration.
B
Create a notebook parameter for batch date, assign its value to a Python variable, and use a Spark DataFrame to filter the data based on this variable.
C
Manually edit the code every time to change the batch date.
D
Create a dynamic view that automatically calculates the batch date and use this view to query the data.
E
There is no way to combine a Python variable and Spark code for filtering.
No comments yet.