
Ultimate access to all questions.
You are tasked with developing a machine learning model to predict customer churn for a telecommunications company. The dataset includes customer demographics, service usage, and complaint history. The company emphasizes the importance of model interpretability to understand the factors influencing churn. Given these constraints, which step in the machine learning workflow is most critical for ensuring the selection of appropriate algorithms and techniques that align with the business's need for interpretability? Choose the best option.
A
Data collection: Gathering a comprehensive dataset that includes all potential predictors of churn, such as customer interactions and service downtime.
B
Model evaluation: Using metrics like accuracy and ROC-AUC to compare the performance of different models on a validation set.
C
Data preprocessing: Applying techniques like normalization and encoding to prepare the data for analysis, ensuring that the dataset is clean and formatted correctly.
D
Model selection: Choosing algorithms known for their interpretability, such as decision trees or logistic regression, over more complex models like deep neural networks.
E
Feature engineering: Creating new features that might better capture the reasons behind customer churn, such as the frequency of complaints or changes in service usage patterns.