Ultimate access to all questions.
A data engineer is troubleshooting a slow-running query on a large Delta table named transactions
. The issue is identified as being due to the table consisting of many small files and the transactionId
column, which has high cardinality and is sparsely distributed across these files, being frequently used in filters. Which command should the data engineer use to optimize performance?