Ultimate access to all questions.
In a scenario where you are working with a large dataset in Azure Databricks that contains 'customer_id' and 'order_id' columns, your task is to ensure data integrity by validating that each 'customer_id' is unique across all rows. This validation is critical for a downstream reporting process that relies on the uniqueness of 'customer_id' for accurate customer analytics. Given the constraints of minimizing compute resources and ensuring the solution is scalable for datasets of varying sizes, which of the following Spark SQL queries would you use to accurately count the number of rows that violate the uniqueness constraint of 'customer_id'? Choose the best option.