
Answer-first summary for fast verification
Answer: FEATURE_CROSS - Creates a new feature by combining two or more input features, perfect for merging country and language into a single variable., QUANTILE_BUCKETIZE - Categorizes a continuous numerical feature into quantile-based buckets, ideal for dividing income into 5 classes.
The **FEATURE_CROSS** function is essential for combining 'country' and 'language' into a single categorical variable, thus avoiding multicollinearity by reducing feature dimensionality. **QUANTILE_BUCKETIZE** is the correct choice for categorizing the 'income' feature into 5 quantile-based classes, ensuring that the model can handle income as an ordinal variable effectively. - **ARRAY_CONCAT** is incorrect because it is designed for merging arrays, not for creating new categorical variables from distinct features. - **ST_AREA** is irrelevant in this context as it deals with geographical area calculations, not demographic data processing. This approach aligns with best practices for feature engineering in machine learning, particularly in scenarios requiring data privacy and scalability. For further reading, refer to the Google BigQuery ML documentation and resources on linear regression assumptions.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
As a junior Data Scientist in a Governmental Institution, you're tasked with preparing a dataset for a linear regression model aimed at demographic research. The dataset is stored in BigQuery and includes various demographic features. Your objectives are to avoid multicollinearity by combining the 'country' and 'language' features into a single categorical variable and to categorize the 'income' feature into 5 quantile-based classes for better model performance. Given the constraints of data privacy and the need for scalable processing, which of the following BigQuery ML functions should you use to achieve these objectives? (Select 2)
A
ARRAY_CONCAT - Merges arrays into a single array, not suitable for creating categorical variables from distinct features.
B
QUANTILE_BUCKETIZE - Categorizes a continuous numerical feature into quantile-based buckets, ideal for dividing income into 5 classes.
C
FEATURE_CROSS - Creates a new feature by combining two or more input features, perfect for merging country and language into a single variable.
D
ST_AREA - Calculates the area in square meters of a GEOGRAPHY, irrelevant for demographic data categorization.