Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
In a multi-tenant Spark environment where each tenant's data is processed in isolation but resides within the same cluster, what is the best approach to optimize data aggregation queries for fairness and efficiency across tenants?
A
Use view-based access controls to isolate tenant data and apply CACHE directives to frequently aggregated datasets for each tenant, ensuring efficient reuse of computation results.
B
Implement custom weighted fair scheduler pools for each tenant and assign Spark jobs to these pools based on the tenant's priority.
C
Leverage Spark SQL's broadcast join for common aggregation queries to minimize shuffle and balance load.
D
Utilize dynamic resource allocation to adjust resource distribution based on the current load of each tenant's queries.