
Ultimate access to all questions.
In the context of developing a machine learning model for a fraud detection system, where the dataset is highly imbalanced with fraud cases representing less than 1% of all transactions, which of the following model types would be most effective in accurately identifying fraudulent transactions while also considering computational efficiency and scalability? Choose the best option.
A
Naive Bayes, due to its simplicity and speed in training, making it suitable for large datasets.
B
Support Vector Machine (SVM), with a linear kernel for its ability to handle high-dimensional data efficiently.
C
Ensemble methods like Random Forest, for their robustness to imbalanced datasets through aggregating predictions from multiple decision trees.
D
Decision Tree, for its interpretability and ease of tuning to focus on the minority class.
E
Both C (Ensemble methods like Random Forest) and D (Decision Tree), as combining their strengths can offer a balanced approach to handling imbalanced datasets.