Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.

Explanation:

The correct answer is B. exclude_cols. This parameter is used to explicitly list columns that AutoML should disregard during model training and tuning. This ensures that irrelevant or potentially problematic columns do not adversely affect the model's performance.

How to use exclude_cols:

from databricks import automl
# Assuming your dataset is a Spark DataFrame named 'data'
automl_model = automl.classify(
    data=data,
    target_col="label_column",  # Specify the column containing the target labels
    exclude_cols=["irrelevant_column1", "irrelevant_column2"],  # Exclude specific columns
    # … other AutoML parameters
)

from databricks import automl
# Assuming your dataset is a Spark DataFrame named 'data'
automl_model = automl.classify(
    data=data,
    target_col="label_column",  # Specify the column containing the target labels
    exclude_cols=["irrelevant_column1", "irrelevant_column2"],  # Exclude specific columns
    # … other AutoML parameters
)

Incorrect Options:

target_col: This parameter specifies the column with the target labels to be predicted, not columns to exclude.
max_trials: This controls the maximum number of model trials AutoML will run, not column selection.
pos_label: Used in binary classification to specify the positive label value, not columns to exclude.

Key Points:

Use exclude_cols to refine the feature set for AutoML, focusing on relevant columns.
This can enhance model performance and training efficiency.
Particularly useful for datasets with many columns, not all of which are relevant to the task.

Explanation:

How to use exclude_cols:

from databricks import automl
# Assuming your dataset is a Spark DataFrame named 'data'
automl_model = automl.classify(
    data=data,
    target_col="label_column",  # Specify the column containing the target labels
    exclude_cols=["irrelevant_column1", "irrelevant_column2"],  # Exclude specific columns
    # … other AutoML parameters
)

from databricks import automl
# Assuming your dataset is a Spark DataFrame named 'data'
automl_model = automl.classify(
    data=data,
    target_col="label_column",  # Specify the column containing the target labels
    exclude_cols=["irrelevant_column1", "irrelevant_column2"],  # Exclude specific columns
    # … other AutoML parameters
)

Incorrect Options:

target_col: This parameter specifies the column with the target labels to be predicted, not columns to exclude.
max_trials: This controls the maximum number of model trials AutoML will run, not column selection.
pos_label: Used in binary classification to specify the positive label value, not columns to exclude.

Key Points:

Use exclude_cols to refine the feature set for AutoML, focusing on relevant columns.
This can enhance model performance and training efficiency.
Particularly useful for datasets with many columns, not all of which are relevant to the task.

Comments (0)

No comments yet.

When training a classification model with Databricks AutoML, your dataset includes several columns that are not relevant to the classification task. Which parameter should you use to specify columns that AutoML should ignore during its calculations?

Real Exam

target_col

0.0%

exclude_cols

100.0%

max_trials

pos_label