
Explanation:
✅ B. Create a notebook parameter for batch date, assign its value to a Python variable, and use a Spark DataFrame to filter the data based on this variable.
Databricks notebooks support notebook parameters, which can be set during job runs or interactively. By defining batch_date as a notebook parameter, you can pass different values each time the program runs. This value can be accessed within the notebook using Databricks utilities (dbutils.widgets.get("batch_date")), assigned to a Python variable, and used in a Spark DataFrame's where clause for filtering. This method is flexible, clean, and integrates well with Databricks job scheduling.
❌ A. Store the batch date in the Spark configuration and use a Spark DataFrame to filter the data based on the Spark configuration. While possible, using Spark configuration for a dynamically changing batch date is less straightforward and conventional than using notebook parameters, adding unnecessary complexity.
❌ C. Manually edit the code every time to change the batch date. This approach is inefficient, error-prone, and contradicts the requirement of avoiding manual changes.
❌ D. Create a dynamic view that automatically calculates the batch date and use this view to query the data. While views can simplify querying, this option doesn't directly address the need for a parameterized batch date that changes with each run. The logic for determining the batch date would still need to be defined, potentially leading back to a solution similar to option B.
❌ E. There is no way to combine a Python variable and Spark code for filtering. This is incorrect. Databricks notebooks seamlessly integrate Python and Spark, allowing Python variables to be used within Spark code, including DataFrame filtering operations.
Ultimate access to all questions.
No comments yet.
How can you parameterize a query to filter data based on a batch date that changes with each run, without manually altering the code each time?
A
Store the batch date in the Spark configuration and use a Spark DataFrame to filter the data based on the Spark configuration.
B
Create a notebook parameter for batch date, assign its value to a Python variable, and use a Spark DataFrame to filter the data based on this variable.
C
Manually edit the code every time to change the batch date.
D
Create a dynamic view that automatically calculates the batch date and use this view to query the data.
E
There is no way to combine a Python variable and Spark code for filtering.