AWS Certified AI Practitioner

Get started today

Ultimate access to all questions.

Explanation:

The company is currently in the Exploratory Data Analysis (EDA) stage of the machine learning pipeline. This conclusion is based on the specific activities described: creating a correlation matrix, calculating statistics, and visualizing the data. These are hallmark tasks of EDA, which focuses on understanding the data's structure, identifying patterns, detecting anomalies, and uncovering relationships between variables before proceeding to other stages.

Why C (Exploratory Data Analysis) is correct:

Correlation Matrix: This tool measures relationships between variables, helping identify multicollinearity or potential predictors, which is a key analytical step in EDA.
Statistical Calculations: Computing measures like mean, median, variance, or skewness provides insights into data distribution and central tendencies, essential for initial data understanding.
Data Visualization: Creating plots (e.g., histograms, scatter plots) allows for visual inspection of trends, outliers, and patterns, a core component of EDA.

Why other options are less suitable:

A (Data Pre-processing): This stage involves cleaning and preparing data for modeling, such as handling missing values, normalization, or encoding categorical variables. The described activities are analytical rather than preparatory, so pre-processing typically follows EDA.
B (Feature Engineering): This involves creating new features or transforming existing ones to improve model performance. While EDA informs feature engineering, the tasks mentioned are about analysis, not feature creation or modification.
D (Hyperparameter Tuning): This occurs during model training and optimization, where parameters are adjusted to enhance performance. It is unrelated to initial data analysis activities.

In summary, the company's focus on analyzing and visualizing data to gain insights aligns precisely with the objectives of Exploratory Data Analysis, making it the correct stage in the ML pipeline.

Explanation:

Why C (Exploratory Data Analysis) is correct:

Correlation Matrix: This tool measures relationships between variables, helping identify multicollinearity or potential predictors, which is a key analytical step in EDA.
Statistical Calculations: Computing measures like mean, median, variance, or skewness provides insights into data distribution and central tendencies, essential for initial data understanding.
Data Visualization: Creating plots (e.g., histograms, scatter plots) allows for visual inspection of trends, outliers, and patterns, a core component of EDA.

Why other options are less suitable:

A (Data Pre-processing): This stage involves cleaning and preparing data for modeling, such as handling missing values, normalization, or encoding categorical variables. The described activities are analytical rather than preparatory, so pre-processing typically follows EDA.
B (Feature Engineering): This involves creating new features or transforming existing ones to improve model performance. While EDA informs feature engineering, the tasks mentioned are about analysis, not feature creation or modification.
D (Hyperparameter Tuning): This occurs during model training and optimization, where parameters are adjusted to enhance performance. It is unrelated to initial data analysis activities.

In summary, the company's focus on analyzing and visualizing data to gain insights aligns precisely with the objectives of Exploratory Data Analysis, making it the correct stage in the ML pipeline.

Comments (0)

No comments yet.

A company is developing a machine learning model. After gathering new data, they performed an analysis that included generating a correlation matrix, computing statistical measures, and creating visualizations. At which stage of the machine learning pipeline is the company currently operating?

Exam-Like

Last updated: February 8, 2026 at 20:17

Data pre-processing

0.0%

Feature engineering

9.1%

Exploratory data analysis

90.9%

Hyperparameter tuning