
Ultimate access to all questions.
In the context of preparing data for machine learning models, data transformation plays a pivotal role. A team is working on a project that involves predicting customer churn for a telecom company. The dataset includes customer demographics, service usage, and complaint history. The raw data is messy, with missing values, inconsistent formats, and categorical variables not suitable for direct input into machine learning algorithms. The team needs to preprocess this data to make it suitable for analysis. Which of the following best describes the process of 'data transformation' in this scenario? Choose the best option.
A
Expanding the dataset by integrating additional data sources such as social media activity to enhance predictive accuracy.
B
The process of gathering raw data from various internal and external sources to compile a comprehensive dataset.
C
Manipulating and converting raw data into a format that's ready for analysis, including handling missing values, normalizing numerical data, and encoding categorical variables.
D
The initial step of obtaining the raw data necessary for analysis from the company's databases and external APIs.