
Ultimate access to all questions.
In the process of developing a deep neural network classification model, you encounter a dataset with categorical input values where some columns have a high cardinality of over 10,000 unique values. Considering the need for efficiency and scalability in model training, along with the constraints of managing high-dimensional data, what is the most effective method to encode these categorical values? Choose the best option.
A
Transform each categorical value into an integer value, which may lead to an arbitrary ordering of categories and potentially mislead the model.
B
Encode the categorical variables into a vector of boolean values, which could result in an excessively sparse and high-dimensional representation.
C
Use one-hot hash buckets to convert the categorical string data, efficiently managing the dimensionality by representing each unique value as a binary vector.
D
Apply run-length encoding to convert each categorical value into a string, which is not suitable for non-sequential data and may not capture the categorical nature effectively.