
Ultimate access to all questions.
In the context of preparing data for a machine learning model, you are working with a dataset that includes categorical variables such as 'Product Category' with values like 'Electronics', 'Clothing', and 'Home Appliances'. The dataset also contains numerical features. Your goal is to preprocess this data to ensure optimal performance of a linear regression model, considering constraints like computational efficiency and the interpretability of the model. Which of the following techniques should you employ to manage the categorical data effectively? Choose the best option.
A
Data augmentation to artificially increase the size of the dataset by creating variations of the existing data points.
B
Normalization to scale all numerical features to a range between 0 and 1, without addressing the categorical data.
C
One-hot encoding to transform each category value into a new binary column, enabling the model to interpret categorical variables numerically.
D
Standardization to adjust all features to have a mean of 0 and a standard deviation of 1, focusing solely on numerical features.
E
Both One-hot encoding for categorical variables and Standardization for numerical features to ensure all data is appropriately scaled and interpretable by the model.