
Ultimate access to all questions.
In the context of building a machine learning solution for a financial services company, the team is tasked with improving the accuracy of credit risk prediction models. The dataset includes transaction histories, customer demographics, and credit scores, but contains missing values, outliers, and inconsistent formatting. Which of the following steps is MOST critical in the data pre-processing phase to ensure the model's performance is optimized? Choose one.
A
Evaluating model performance using cross-validation techniques.
B
Making predictions or classifications directly on the raw data to identify patterns.
C
Cleaning, transforming, and preparing the data for modeling, including handling missing values and outliers.
D
Defining the problem statement and objectives without altering the dataset.