
Answer-first summary for fast verification
Answer: Combine the small files into larger files before processing, to reduce the number of file reads and minimize overhead.
Combining the small files into larger files before processing is the most effective approach to improve performance. This method directly addresses the root cause of the performance issue by reducing the overhead associated with processing a large number of small files. Increasing the Data Flow's resources (A) may offer some improvement but does not solve the small file problem and could lead to higher costs. Converting files to a more efficient format (C) is beneficial for storage and processing efficiency but does not specifically tackle the small file overhead. Implementing a custom pre-filtering activity (D) could reduce the data volume but adds unnecessary complexity and does not address the small file processing inefficiency.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
As a Microsoft Fabric Analytics Engineer Associate, you are tasked with optimizing a data pipeline in Azure Data Factory that processes a dataflow reading from a large number of small files stored in Azure Blob Storage. The current performance is suboptimal due to the overhead of processing numerous small files. Considering the constraints of cost, compliance, and scalability, which of the following approaches would BEST improve the performance of the dataflow? Choose one option.
A
Increase the Data Flow's core count and memory allocation to enhance processing power, despite the potential increase in cost.
B
Combine the small files into larger files before processing, to reduce the number of file reads and minimize overhead.
C
Convert the files to a more efficient format like Parquet, without addressing the small file issue directly.
D
Implement a custom activity to pre-filter the data in the files before the dataflow processes them, adding complexity to the pipeline.
No comments yet.