Google Professional Machine Learning Engineer

Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.


As a data scientist working for a bank, you are tasked with building a random forest model to detect fraudulent transactions. Your dataset contains transactional data, with only 1% of the transactions labeled as fraudulent. Due to the class imbalance, you need to choose an appropriate data transformation strategy to enhance the performance of your classifier. Which data transformation strategy should you use?




Explanation:

The correct answer is C. Given the dataset's class imbalance, with only 1% of the transactions identified as fraudulent, oversampling the fraudulent transactions 10 times will help balance the classes and improve the model's ability to detect fraud. Writing data in TFRecords (A), Z-normalizing numeric features (B), and using one-hot encoding on categorical features (D) are all useful preprocessing techniques but will not address the class imbalance issue directly.