A data engineering team is automating a Spark SQL query to compile monthly sales data from a table named in the format `monthly_sales_YYYYMM`, where `YYYYMM` represents the year and month. The query must automatically target the previous month's data. For example, if it's March 2023, the query should access `monthly_sales_202302`. The standard query is: `SELECT product_category, SUM(sales) FROM monthly_sales_YYYYMM GROUP BY product_category;`. What strategy should the team use to ensure the query dynamically adjusts to the previous month?

Real Exam

Propose altering the query‘s execution frequency to quarterly, thereby reducing the need for monthly table name updates.

0.5%

Manually update the table name in the query to reflect the previous month‘s data before executing it each time.

0.5%

Develop a PySpark script that calculates the date for the previous month and dynamically inserts this date into the table name within the query string.

92.9%

Revise the database design to consolidate sales data into a single table with a month column, avoiding the necessity for separate monthly tables.

6.0%

Databricks Certified Data Engineer - Associate

Comments

Get started today