
Answer-first summary for fast verification
Answer: Pandas, for its comprehensive data structures and functions designed for data manipulation and analysis.
**Correct Answer: B. Pandas** **Explanation:** Pandas is the most suitable library for this scenario due to its powerful data manipulation and exploration capabilities, especially with tabular data. It provides intuitive data structures like DataFrames and Series, which are ideal for handling the described dataset. Pandas excels in: - **Data cleaning and preprocessing:** Efficiently handling missing values, outliers, and data inconsistencies. - **Data exploration:** Offering tools for generating summary statistics and initial data visualization. - **Feature extraction:** Simplifying the process of creating new features from existing data. **Incorrect Options:** - **A. Matplotlib:** While excellent for data visualization, it lacks the data manipulation capabilities needed for preprocessing. - **C. TensorFlow:** Primarily used for building and training machine learning models, not for data preprocessing. - **D. NumPy:** Focuses on numerical computations and is less suited for the comprehensive data manipulation tasks required here.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the context of developing a machine learning model for a financial services company, you are tasked with preprocessing a large dataset containing customer transaction records. The dataset includes missing values, outliers, and requires extensive manipulation to extract meaningful features. Given the need for efficient data manipulation and exploration, especially with tabular data, which Python library would you primarily use to address these challenges? Choose the best option.
A
Matplotlib, for its superior data visualization capabilities that can help in identifying outliers and trends.
B
Pandas, for its comprehensive data structures and functions designed for data manipulation and analysis.
C
TensorFlow, for its advanced capabilities in building and training deep learning models on the processed data.
D
NumPy, for its efficient numerical computations and array manipulations that can speed up data preprocessing.