Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

Explanation:

Correct Option: C. One-hot encoding
One-hot encoding is the most appropriate technique for transforming categorical data into a numerical format that machine learning algorithms can process. It involves creating a binary column for each category, where a '1' signifies the presence of the category and a '0' indicates its absence. This method preserves the information in the categorical variables without introducing an ordinal relationship where none exists.

Why other options are incorrect:

A. Data augmentation: This technique is used to artificially expand the size of a dataset by generating new data points, primarily for image or text data, not for handling categorical data.
B. Normalization: This process scales numerical features to a common range (e.g., 0 to 1) but does not apply to categorical data.
D. Standardization: Similar to normalization, standardization adjusts numerical features to have a mean of 0 and a standard deviation of 1, which is not suitable for categorical data.

Explanation:

Why other options are incorrect:

A. Data augmentation: This technique is used to artificially expand the size of a dataset by generating new data points, primarily for image or text data, not for handling categorical data.
B. Normalization: This process scales numerical features to a common range (e.g., 0 to 1) but does not apply to categorical data.
D. Standardization: Similar to normalization, standardization adjusts numerical features to have a mean of 0 and a standard deviation of 1, which is not suitable for categorical data.

Comments (0)

No comments yet.

In the context of preparing data for machine learning models, you are working with a dataset that includes categorical variables such as 'color' with values like 'red', 'blue', and 'green'. The dataset also contains numerical features. Your goal is to preprocess this data to ensure it is suitable for a machine learning algorithm that requires numerical input. Considering the need for scalability and the preservation of information, which of the following techniques should you employ to transform the categorical data into a format suitable for machine learning algorithms? Choose one correct option.

Real Exam

Data augmentation

0.0%

Normalization

0.0%

One-hot encoding

100.0%

Standardization