
Answer-first summary for fast verification
Answer: Develop a PySpark script that calculates the date for the previous month and dynamically inserts this date into the table name within the query string.
Using PySpark with Python's datetime functionality allows the team to automate the calculation of the previous month's date. This method dynamically constructs the query string to include the correct table name, such as `monthly_sales_202302` for March 2023. It's efficient, minimizes human error, and ensures timely data access without manual updates.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineering team is automating a Spark SQL query to compile monthly sales data from a table named in the format monthly_sales_YYYYMM, where YYYYMM represents the year and month. The query must automatically target the previous month's data. For example, if it's March 2023, the query should access monthly_sales_202302. The standard query is: SELECT product_category, SUM(sales) FROM monthly_sales_YYYYMM GROUP BY product_category;. What strategy should the team use to ensure the query dynamically adjusts to the previous month?
A
Propose altering the query‘s execution frequency to quarterly, thereby reducing the need for monthly table name updates.
B
Manually update the table name in the query to reflect the previous month‘s data before executing it each time.
C
Develop a PySpark script that calculates the date for the previous month and dynamically inserts this date into the table name within the query string.
D
Revise the database design to consolidate sales data into a single table with a month column, avoiding the necessity for separate monthly tables.
No comments yet.