Google Professional Data Engineer

Google Professional Data Engineer

Get started today

Ultimate access to all questions.


You are developing a predictive model to determine the likelihood of rainfall on a specific day. Given the vast number of input features available, you aim to explore methods to enhance the training speed of your model by eliminating some of these features, while ensuring that the reduction has minimal impact on the model's accuracy. What strategies can you employ to achieve this?




Explanation:

The correct answer is B. Combining highly co-dependent features into one representative feature can reduce the dimensionality of your dataset, making the training process faster while preserving relevant information. This approach helps eliminate redundancy in the input data. Other options, such as eliminating features highly correlated to the output labels (A), may remove valuable predictive information, and averaging feature values in batches (C) can lose important details. Removing features with a high percentage of null values (D) may reduce dimensionality but does not address feature dependency.