Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
When performing a merge operation in Delta Lake that frequently leads to small file issues, what strategy ensures optimal file management without compromising upsert performance?
A
Increase spark.databricks.delta.merge.repartitionBeforeWrite configuration to a high value for all operations.
B
Pre-partition both source and target datasets by merge keys to reduce small file creation.
C
Routinely use the OPTIMIZE command post-merge with a high-frequency schedule.
D
Disable file compaction and rely on manual optimization routines.