
Ultimate access to all questions.
As a global retailer, you are tasked with securely constructing ML models to predict customer preferences while ensuring the protection of sensitive customer data. The data includes four sensitive fields: AGE, IS_EXISTING_CUSTOMER, LATITUDE_LONGITUDE, and SHIRT_SIZE. Given the constraints of maintaining data utility for model training and ensuring compliance with global data protection regulations, which of the following steps should you take with this data before it's used by the data science team? Choose two correct options.
A
Apply differential privacy techniques to add noise to the sensitive fields, ensuring individual data points cannot be traced back while preserving the overall data distribution.
B
Remove all sensitive data fields entirely, instructing the data science team to rely solely on non-sensitive data for model building, despite potential loss in model accuracy.
C
Tokenize each sensitive field by replacing real values with hashed dummy values, ensuring original data isn't exposed while still usable for model training.
D
Simplify the data by categorizing AGE into quantiles and reducing LATITUDE_LONGITUDE precision to a single decimal place, arguing that the remaining fields are already simplified as much as possible._
E
Encrypt the sensitive fields using a secure encryption algorithm before sharing with the data science team, requiring decryption keys for access during model training.