Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

Question 22

A data analyst has provided a data engineering team with the following Spark SQL query:

SELECT district,
       avg(sales)
FROM store_sales_20220101
GROUP BY district;

SELECT district,
       avg(sales)
FROM store_sales_20220101
GROUP BY district;

The data analyst would like the data engineering team to run this query every day. The date at the end of the table name (20220101) should automatically be replaced with the current date each time the query is run.

Which of the following approaches could be used by the data engineering team to efficiently automate this process?

Real Exam

Community

LLeetQuiz

They could wrap the query using PySpark and use Python's string variable system to automatically update the table name.

They could manually replace the date within the table name with the current day's date.

They could request that the data analyst rewrites the query to be run less frequently.

They could replace the string-formatted date in the table with a timestamp-formatted date.

They could pass the table into PySpark and develop a robustly tested module on the existing query.

Explanation:

Explanation

Option A is the correct answer because:

Automation: Using PySpark with Python's string formatting allows dynamic generation of table names based on the current date, enabling fully automated daily execution
Efficiency: This approach eliminates manual intervention and can be scheduled to run automatically
Scalability: The solution can be easily integrated into production pipelines and scheduled workflows

Other options are incorrect:

B: Manual replacement is inefficient and not scalable for daily automation
C: Changing the frequency doesn't solve the automation problem
D: Changing the date format doesn't address the dynamic table name requirement
E: While developing a tested module is good practice, it doesn't specifically address the dynamic table name automation

This approach aligns with Databricks best practices for production pipelines where dynamic table references are common in daily data processing workflows.

Powered ByGPT-5.2

Comments

Loading comments...