
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
In a scenario where you are processing a large dataset using Spark Structured Streaming in a cost-sensitive environment, you notice that the performance of your queries is significantly degraded due to the presence of numerous tiny files. These files are a result of frequent small writes to the storage system. Considering the need to optimize query performance while minimizing costs and ensuring scalability, which of the following strategies would be the MOST effective in addressing the issues caused by tiny files? Choose one option.
In a scenario where you are processing a large dataset using Spark Structured Streaming in a cost-sensitive environment, you notice that the performance of your queries is significantly degraded due to the presence of numerous tiny files. These files are a result of frequent small writes to the storage system. Considering the need to optimize query performance while minimizing costs and ensuring scalability, which of the following strategies would be the MOST effective in addressing the issues caused by tiny files? Choose one option.
Simulated