Ultimate access to all questions.
In a scenario where you are analyzing customer transaction data to identify unique customers and their total spending, you are given a dataset with 'customer_id' and 'transaction_amount' columns. Your task is to write a Spark SQL query that creates a new table containing only the rows with unique 'customer_id' values and the total transaction amount for each unique 'customer_id'. Consider the following constraints: the solution must be scalable to handle large datasets efficiently, and it should minimize computational costs. Which of the following queries best meets these requirements? (Choose one option)