
Ultimate access to all questions.
In the context of preparing a dataset for machine learning, a team is tasked with ensuring the highest data quality before model training. The dataset includes a mix of numerical and categorical data, with some missing values and potential outliers. The team decides to perform Exploratory Data Analysis (EDA) as a preliminary step. Considering the need for cost efficiency, compliance with data privacy regulations, and scalability for large datasets, which of the following is the primary purpose of EDA in this scenario? Choose one correct option.
A
To directly deploy the machine learning models into production without further adjustments.
B
To fine-tune the hyperparameters of the machine learning models to achieve optimal performance.
C
To identify patterns, anomalies, and insights in the data that could affect model accuracy and reliability.
D
To automate the entire data processing pipeline, eliminating the need for manual intervention.