Ultimate access to all questions.
Consider a scenario where a notebook is processing large datasets using PySpark and is experiencing performance issues due to data shuffling. Explain how you would identify this issue and what steps you would take to resolve it. Specifically, discuss the use of broadcast joins and caching techniques.