
Explanation:
The default value of maxFilesPerTrigger is 1000, which can be increased to a much higher number but will require more compute resources to process. Increasing this value reduces the overhead of individual file discovery and setup, allowing Auto Loader to leverage your scaled-up Databricks cluster more effectively for higher throughput. It's essential to find the right balance; setting it too high might strain resources. Other options like merging files, setting up a second Auto Loader process, copying data to local disk, or deeming Auto Loader unsuitable are not the most direct or scalable solutions for this scenario.
Ultimate access to all questions.
No comments yet.
You are using AUTO LOADER to process millions of files daily and noticed a slowdown in the load process. After scaling up the Databricks cluster, the performance of the Auto Loader did not improve. What is the most effective solution to this issue?
A
Merge files to one large file
B
Increase the maxFilesPerTrigger option to a sufficiently high number
C
Setup a second AUTO LOADER process to process the data
D
Copy the data from cloud storage to local disk on the cluster for faster access
E
AUTO LOADER is not suitable to process millions of files a day