
Answer-first summary for fast verification
Answer: They could wrap the queries using PySpark and use Python's control flow system to determine when to run the final query.
**Explanation:** **Why B is correct:** 1. **Python control flow**: PySpark allows you to wrap SQL queries and use Python's datetime functions to check the current day of the week. 2. **Conditional execution**: You can use `if` statements to execute the final query only when `datetime.today().weekday() == 6` (Sunday). 3. **Daily execution**: The program can still run daily, but the final query only executes on Sundays. 4. **Flexibility**: This approach provides the most control and doesn't require changing data models or restricting access. **Why other options are incorrect:** - **A**: Submitting a feature request is not a practical solution for immediate implementation. - **C**: Running the entire program only on Sundays contradicts the requirement to run daily. - **D**: Restricting table access would prevent the query from running, but it's not a clean solution and could affect other processes. - **E**: Redesigning the data model is overly complex for a simple scheduling requirement. **Implementation example:** ```python from datetime import datetime from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() # Run daily queries spark.sql("SELECT * FROM daily_table") spark.sql("SELECT * FROM intermediate_table") # Only run final query on Sundays if datetime.today().weekday() == 6: # Sunday spark.sql("SELECT * FROM final_table") ```
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task. Which of the following approaches could be used by the data engineering team to complete this task?
A
They could submit a feature request with Databricks to add this functionality.
B
They could wrap the queries using PySpark and use Python's control flow system to determine when to run the final query.
C
They could only run the entire program on Sundays.
D
They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.
E
They could redesign the data model to separate the data used in the final query into a new table.