Google Professional Machine Learning Engineer

Ultimate access to all questions.

In the context of developing a deep neural network for classification tasks, you are presented with a dataset that includes categorical inputs. Some of these categorical columns contain over 10,000 unique values, posing a challenge for effective encoding. The dataset is part of a larger project aimed at predicting customer behavior in real-time, requiring the solution to be scalable and cost-efficient. Given these constraints, which of the following methods is MOST effective for encoding these high-cardinality categorical values as model inputs? Choose the best option.

Real Exam

Assign a unique numerical value to each categorical value, maintaining a direct mapping.

5.3%

Utilize one-hot encoding with hash buckets to efficiently represent the categorical data.

Loading comments...