
Answer-first summary for fast verification
Answer: They could wrap the query using PySpark and use Python's string variable system to automatically update the table name.
## Explanation Option **A** is the correct answer because: - **Automation**: Using PySpark with Python's string formatting allows dynamic generation of table names based on the current date, enabling fully automated daily execution - **Efficiency**: This approach eliminates manual intervention and can be scheduled to run automatically - **Scalability**: The solution can be easily integrated into production pipelines and scheduled workflows Other options are incorrect: - **B**: Manual replacement is inefficient and not scalable for daily automation - **C**: Changing the frequency doesn't solve the automation problem - **D**: Changing the date format doesn't address the dynamic table name requirement - **E**: While developing a tested module is good practice, it doesn't specifically address the dynamic table name automation This approach aligns with Databricks best practices for production pipelines where dynamic table references are common in daily data processing workflows.
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
Question 22
A data analyst has provided a data engineering team with the following Spark SQL query:
SELECT district,
avg(sales)
FROM store_sales_20220101
GROUP BY district;
SELECT district,
avg(sales)
FROM store_sales_20220101
GROUP BY district;
The data analyst would like the data engineering team to run this query every day. The date at the end of the table name (20220101) should automatically be replaced with the current date each time the query is run.
Which of the following approaches could be used by the data engineering team to efficiently automate this process?
A
They could wrap the query using PySpark and use Python's string variable system to automatically update the table name.
B
They could manually replace the date within the table name with the current day's date.
C
They could request that the data analyst rewrites the query to be run less frequently.
D
They could replace the string-formatted date in the table with a timestamp-formatted date.
E
They could pass the table into PySpark and develop a robustly tested module on the existing query.