
Ultimate access to all questions.
In the context of preparing a large dataset for machine learning, you are tasked with reducing its dimensionality to improve model performance and reduce computational costs. The dataset contains hundreds of features, some of which are highly correlated. Given the constraints of needing to preserve as much of the original variability as possible and the requirement to facilitate easier data visualization, which of the following techniques would be the MOST appropriate to achieve these goals? Choose one correct option.
A
Gradient Descent, as it optimizes the model parameters to minimize the loss function, indirectly reducing dimensionality by focusing on the most relevant features.
B
Random Forest, which can inherently select important features during the construction of decision trees, thus reducing dimensionality.
C
Principal Component Analysis (PCA), a technique designed to transform a large set of variables into a smaller one that still contains most of the information in the large set.
D
K-Nearest Neighbors (KNN), which reduces dimensionality by considering only the closest neighbors in the feature space for making predictions.