
Databricks Certified Data Engineer - Associate
Get started today
Ultimate access to all questions.
How can you parameterize a query to filter data based on a batch date that changes with each run, without manually altering the code each time?
How can you parameterize a query to filter data based on a batch date that changes with each run, without manually altering the code each time?
Explanation:
âś… B. Create a notebook parameter for batch date, assign its value to a Python variable, and use a Spark DataFrame to filter the data based on this variable.
Databricks notebooks support notebook parameters, which can be set during job runs or interactively. By defining batch_date
as a notebook parameter, you can pass different values each time the program runs. This value can be accessed within the notebook using Databricks utilities (dbutils.widgets.get("batch_date")
), assigned to a Python variable, and used in a Spark DataFrame's where clause for filtering. This method is flexible, clean, and integrates well with Databricks job scheduling.
❌ A. Store the batch date in the Spark configuration and use a Spark DataFrame to filter the data based on the Spark configuration. While possible, using Spark configuration for a dynamically changing batch date is less straightforward and conventional than using notebook parameters, adding unnecessary complexity.
❌ C. Manually edit the code every time to change the batch date. This approach is inefficient, error-prone, and contradicts the requirement of avoiding manual changes.
❌ D. Create a dynamic view that automatically calculates the batch date and use this view to query the data. While views can simplify querying, this option doesn't directly address the need for a parameterized batch date that changes with each run. The logic for determining the batch date would still need to be defined, potentially leading back to a solution similar to option B.
❌ E. There is no way to combine a Python variable and Spark code for filtering. This is incorrect. Databricks notebooks seamlessly integrate Python and Spark, allowing Python variables to be used within Spark code, including DataFrame filtering operations.