
Answer-first summary for fast verification
Answer: Leverage the file system's built-in compaction feature, if available, to periodically merge small files.
Option C is the correct approach as it leverages the file system's built-in compaction feature to periodically merge small files, which is a common practice in big data systems like Hadoop. This helps in reducing the number of files, which in turn reduces the metadata overhead and improves the performance of data processing tasks. Option A is not a strategy but a feature of the file system. Option B, while possible, would add complexity to the data ingestion phase. Option D is not recommended as it does not address the issue of small files, which can lead to inefficient processing.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a big data processing scenario, you are tasked with optimizing the handling of a large number of small files in a distributed file system. What strategies would you employ to compact these small files and why?
A
Use a distributed file system that inherently compacts small files.
B
Implement a custom script to merge small files into larger ones during the data ingestion phase.
C
Leverage the file system's built-in compaction feature, if available, to periodically merge small files.
D
Ignore the small files and process them as they are, without any compaction.
No comments yet.