
Answer-first summary for fast verification
Answer: Oversample the fraudulent transaction 10 times.
The correct answer is C. Given the dataset's class imbalance, with only 1% of the transactions identified as fraudulent, oversampling the fraudulent transactions 10 times will help balance the classes and improve the model's ability to detect fraud. Writing data in TFRecords (A), Z-normalizing numeric features (B), and using one-hot encoding on categorical features (D) are all useful preprocessing techniques but will not address the class imbalance issue directly.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
As a data scientist working for a bank, you are tasked with building a random forest model to detect fraudulent transactions. Your dataset contains transactional data, with only 1% of the transactions labeled as fraudulent. Due to the class imbalance, you need to choose an appropriate data transformation strategy to enhance the performance of your classifier. Which data transformation strategy should you use?
A
Write your data in TFRecords.
B
Z-normalize all the numeric features.
C
Oversample the fraudulent transaction 10 times.
D
Use one-hot encoding on all categorical features.
No comments yet.