
Answer-first summary for fast verification
Answer: exclude_cols
The correct answer is **B. exclude_cols**. This parameter is used to explicitly list columns that AutoML should disregard during model training and tuning. This ensures that irrelevant or potentially problematic columns do not adversely affect the model's performance. **How to use exclude_cols:** ```python from databricks import automl # Assuming your dataset is a Spark DataFrame named 'data' automl_model = automl.classify( data=data, target_col="label_column", # Specify the column containing the target labels exclude_cols=["irrelevant_column1", "irrelevant_column2"], # Exclude specific columns # … other AutoML parameters ) ``` **Incorrect Options:** - **target_col**: This parameter specifies the column with the target labels to be predicted, not columns to exclude. - **max_trials**: This controls the maximum number of model trials AutoML will run, not column selection. - **pos_label**: Used in binary classification to specify the positive label value, not columns to exclude. **Key Points:** - Use `exclude_cols` to refine the feature set for AutoML, focusing on relevant columns. - This can enhance model performance and training efficiency. - Particularly useful for datasets with many columns, not all of which are relevant to the task.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When training a classification model with Databricks AutoML, your dataset includes several columns that are not relevant to the classification task. Which parameter should you use to specify columns that AutoML should ignore during its calculations?
A
target_col
B
exclude_cols
C
max_trials
D
pos_label
No comments yet.