
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
Question 22
A data analyst has provided a data engineering team with the following Spark SQL query:
SELECT district,
avg(sales)
FROM store_sales_20220101
GROUP BY district;
SELECT district,
avg(sales)
FROM store_sales_20220101
GROUP BY district;
The data analyst would like the data engineering team to run this query every day. The date at the end of the table name (20220101) should automatically be replaced with the current date each time the query is run.
Which of the following approaches could be used by the data engineering team to efficiently automate this process?
A
They could wrap the query using PySpark and use Python's string variable system to automatically update the table name.
B
They could manually replace the date within the table name with the current day's date.
C
They could request that the data analyst rewrites the query to be run less frequently.
D
They could replace the string-formatted date in the table with a timestamp-formatted date.
E
They could pass the table into PySpark and develop a robustly tested module on the existing query.
Explanation:
Option A is the correct answer because:
Other options are incorrect:
This approach aligns with Databricks best practices for production pipelines where dynamic table references are common in daily data processing workflows.