
Ultimate access to all questions.
As a junior Data Scientist in a Governmental Institution, you're tasked with preparing a dataset for a linear regression model aimed at demographic research. The dataset is stored in BigQuery and includes various demographic features. Your objectives are to avoid multicollinearity by combining the 'country' and 'language' features into a single categorical variable and to categorize the 'income' feature into 5 quantile-based classes for better model performance. Given the constraints of data privacy and the need for scalable processing, which of the following BigQuery ML functions should you use to achieve these objectives? (Select 2)
A
ARRAY_CONCAT - Merges arrays into a single array, not suitable for creating categorical variables from distinct features._
B
QUANTILE_BUCKETIZE - Categorizes a continuous numerical feature into quantile-based buckets, ideal for dividing income into 5 classes._
C
FEATURE_CROSS - Creates a new feature by combining two or more input features, perfect for merging country and language into a single variable._
D
ST_AREA - Calculates the area in square meters of a GEOGRAPHY, irrelevant for demographic data categorization._