
Ultimate access to all questions.
You are tasked with improving the performance of a notebook that processes large datasets using PySpark. The notebook currently takes several hours to complete its tasks. Describe the steps you would take to identify performance bottlenecks in the notebook and how you would implement improvements. Consider both code-level optimizations and configuration settings.
A
Increase the number of executors and memory allocation.
B
Rewrite the PySpark code to use more efficient transformations and actions.
C
Reduce the dataset size by filtering unnecessary data.
D
Use dynamic partitioning to distribute data processing tasks.