In the context of developing a machine learning model for a fraud detection system, where the dataset is highly imbalanced with fraud cases representing less than 1% of all transactions, which of the following model types would be most effective in accurately identifying fraudulent transactions while also considering computational efficiency and scalability? Choose the best option.

Real Exam

Naive Bayes, due to its simplicity and speed in training, making it suitable for large datasets.

0.0%

Support Vector Machine (SVM), with a linear kernel for its ability to handle high-dimensional data efficiently.

3.6%

Ensemble methods like Random Forest, for their robustness to imbalanced datasets through aggregating predictions from multiple decision trees.

50.0%

Decision Tree, for its interpretability and ease of tuning to focus on the minority class.

46.4%

Both C (Ensemble methods like Random Forest) and D (Decision Tree), as combining their strengths can offer a balanced approach to handling imbalanced datasets.

Google Professional Machine Learning Engineer

Get started today

Comments