
Ultimate access to all questions.
In a scenario where you are tasked with analyzing employee salary data to optimize departmental budgets, you have a DataFrame 'df' with columns 'employee_id', 'department_id', and 'salary'. Your goal is to calculate the average salary for each department, ensuring that departments with a NULL 'department_id' are excluded from the analysis to maintain data integrity. Considering the importance of accurate data for decision-making, which of the following Spark SQL queries would you use to achieve this task efficiently? Choose the best option from the four provided below._
A
SELECT department_id, AVG(salary) FROM df WHERE department_id IS NOT NULL GROUP BY department_id_
B
SELECT department_id, AVG(salary) FROM df GROUP BY department_id HAVING department_id IS NOT NULL_
C
SELECT department_id, AVG(salary) FROM df GROUP BY department_id EXCEPT SELECT department_id FROM df WHERE department_id IS NULL
D
SELECT department_id, AVG(salary) FROM df WHERE department_id IS NOT NULL