
Explanation:
The correct answer is B. exclude_cols. This parameter is used to explicitly list columns that AutoML should disregard during model training and tuning. This ensures that irrelevant or potentially problematic columns do not adversely affect the model's performance.
How to use exclude_cols:
from databricks import automl
# Assuming your dataset is a Spark DataFrame named 'data'
automl_model = automl.classify(
data=data,
target_col="label_column", # Specify the column containing the target labels
exclude_cols=["irrelevant_column1", "irrelevant_column2"], # Exclude specific columns
# … other AutoML parameters
)
from databricks import automl
# Assuming your dataset is a Spark DataFrame named 'data'
automl_model = automl.classify(
data=data,
target_col="label_column", # Specify the column containing the target labels
exclude_cols=["irrelevant_column1", "irrelevant_column2"], # Exclude specific columns
# … other AutoML parameters
)
Incorrect Options:
Key Points:
exclude_cols to refine the feature set for AutoML, focusing on relevant columns.Ultimate access to all questions.
When training a classification model with Databricks AutoML, your dataset includes several columns that are not relevant to the classification task. Which parameter should you use to specify columns that AutoML should ignore during its calculations?
A
target_col
B
exclude_cols
C
max_trials
D
pos_label
No comments yet.