Ultimate access to all questions.
In a scenario where you are tasked with analyzing employee salary data to optimize departmental budgets, you have a DataFrame 'df' with columns 'employee_id', 'department_id', and 'salary'. Your goal is to calculate the average salary for each department, ensuring that departments with a NULL 'department_id' are excluded from the analysis to maintain data integrity. Considering the importance of accurate data for decision-making, which of the following Spark SQL queries would you use to achieve this task efficiently? Choose the best option from the four provided below.