
Answer-first summary for fast verification
Answer: Increase the `maxFilesPerTrigger` option to a sufficiently high number
The default value of `maxFilesPerTrigger` is 1000, which can be increased to a much higher number but will require more compute resources to process. Increasing this value reduces the overhead of individual file discovery and setup, allowing Auto Loader to leverage your scaled-up Databricks cluster more effectively for higher throughput. It's essential to find the right balance; setting it too high might strain resources. Other options like merging files, setting up a second Auto Loader process, copying data to local disk, or deeming Auto Loader unsuitable are not the most direct or scalable solutions for this scenario.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are using AUTO LOADER to process millions of files daily and noticed a slowdown in the load process. After scaling up the Databricks cluster, the performance of the Auto Loader did not improve. What is the most effective solution to this issue?
A
Merge files to one large file
B
Increase the maxFilesPerTrigger option to a sufficiently high number
C
Setup a second AUTO LOADER process to process the data
D
Copy the data from cloud storage to local disk on the cluster for faster access
E
AUTO LOADER is not suitable to process millions of files a day
No comments yet.