
Answer-first summary for fast verification
Answer: Boosted Tree - XGBoost, Random Forest
The most suitable ML models for the fraud detection system, considering the requirements, are: **Boosted Tree - XGBoost (C):** - An ensemble method that sequentially trains trees, focusing on errors from previous trees. - Known for high performance, speed, and accuracy. - Includes regularization to prevent overfitting. - Capable of handling missing values, which is beneficial for incomplete transaction data. **Random Forest (E):** - Combines multiple decision trees, enhancing robustness to noise and outliers. - Provides feature importance insights, identifying indicators of fraud. - Suitable for both categorical and numerical data. - Less prone to overfitting compared to a single decision tree. These models are well-suited for real-time processing, outlier detection, and require minimal labeling. They can be periodically trained on aggregated data to identify anomalies in transaction patterns. K-means, a clustering algorithm, may not be ideal for fraud detection as it doesn't explicitly model fraud or non-fraud. Decision Trees and Matrix Factorization are less suitable due to their limitations with complex patterns and large datasets.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Your team is developing a real-time fraud detection system for a major international bank. The system must meet the following requirements: Process transactions in real-time from various banking applications in a standardized format, store data in real-time with some statistical aggregations, periodically train an ML model for outlier detection, provide a probability of fraud for each transaction, and minimize labeling and software development efforts. Additionally, the system must comply with international banking regulations, handle high transaction volumes without significant latency, and be scalable to accommodate future growth. Given these constraints, which two ML models would best meet these requirements? Choose two correct options.
A
Decision Tree
B
Matrix Factorization
C
Boosted Tree - XGBoost
D
K-means
E
Random Forest