Explain the concept of feature selection in machine learning and its importance in model performance. How would you approach feature selection in a distributed environment using Spark ML, and what tools would you use to identify and select the most relevant features?