
Answer-first summary for fast verification
Answer: Principal component analysis (PCA)
Principal component analysis (PCA) is the optimal technique for reducing the number of features by generating new variables through linear combinations of the original ones. These new variables, known as principal components, can replace the original features while preserving the majority of the information crucial for the model. Moreover, PCA ensures that the new features are mutually independent, which is a fundamental assumption in linear models. - **Option A (Embeddings)** is incorrect because embeddings are primarily used for transforming large sparse vectors into smaller vectors, especially with categorical data, not for feature synthesis in linear regression models. - **Option C (Feature Crosses)** is incorrect as they aim to introduce non-linearity rather than simplifying the feature set. - **Option D (Functional Data Analysis)** is not suitable here because it involves replacing features with functions, which does not align with the goal of feature synthesis in this context. For further reading: [Google Developers - Embeddings](https://developers.google.com/machine-learning/crash-course/embeddings/categorical-input-data) [Built In - PCA Explanation](https://builtin.com/data-science/step-step-explanation-principal-component-analysis) [Wikipedia - PCA](https://en.wikipedia.org/wiki/Principal_component_analysis)
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the context of optimizing a Linear Regression model for supply management across a sales network, you are faced with a dataset comprising a vast array of driving factors. The model's current performance is hindered by the high dimensionality of the feature space, leading to inefficiencies in both training and inference times. Your objective is to reduce the number of features without significantly compromising the model's predictive accuracy. Considering the need for computational efficiency and the preservation of valuable information, which of the following techniques would be most appropriate? (Choose one correct option)
A
Embeddings
B
Principal component analysis (PCA)
C
Feature Crosses
D
Functional Data Analysis