Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

In the process of developing a deep neural network classification model, you encounter a dataset with categorical input values where some columns have a high cardinality of over 10,000 unique values. Considering the need for efficiency and scalability in model training, along with the constraints of managing high-dimensional data, what is the most effective method to encode these categorical values? Choose the best option.

Real Exam

Transform each categorical value into an integer value, which may lead to an arbitrary ordering of categories and potentially mislead the model.

0.0%

Encode the categorical variables into a vector of boolean values, which could result in an excessively sparse and high-dimensional representation.

4.5%

Comments

Loading comments...

Use one-hot hash buckets to convert the categorical string data, efficiently managing the dimensionality by representing each unique value as a binary vector.

86.4%

Apply run-length encoding to convert each categorical value into a string, which is not suitable for non-sequential data and may not capture the categorical nature effectively.

9.1%