
Ultimate access to all questions.
You are working on a big data project where you need to process a large number of small files in Azure Data Lake Storage. Describe the steps you would take to compact these small files into larger ones to optimize processing. Include details on the tools and methods you would use, and explain how this approach helps in reducing the overhead associated with processing numerous small files.
A
Use Azure Data Factory to copy files without any aggregation.
B
Use Azure Databricks with a custom Python script to read and merge small files into larger ones.
C
Manually download and compress files using a local machine.
D
Ignore the issue as it does not significantly impact performance.