
Answer-first summary for fast verification
Answer: Tokenize all of the fields using hashed dummy values to replace the real values.
The correct answer is A. Tokenizing all of the fields using hashed dummy values helps in hiding the real sensitive customer data while still maintaining the structure needed for model training. Hashing ensures that the sensitive information is obfuscated, reducing the risk of exposing sensitive details. Options B and C, though they propose transforming or coarsening the data, do not adequately protect the sensitive information as PCA may not retain all necessary details and rounding can still leave some identifiable traces. Option D, which involves removing all sensitive data fields, would compromise the model's accuracy and effectiveness by eliminating potentially important predictive variables.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You work for a global clothing retailer and have been assigned the task of ensuring that machine learning models are built in a secure and compliant manner. Customer data used in these models contains sensitive information that needs to be protected. The specific fields identified as sensitive by your data science team are AGE, IS_EXISTING_CUSTOMER, LATITUDE_LONGITUDE, and SHIRT_SIZE. What steps should you take to safeguard this sensitive data before making it available for model training?
A
Tokenize all of the fields using hashed dummy values to replace the real values.
B
Use principal component analysis (PCA) to reduce the four sensitive fields to one PCA vector.
C
Coarsen the data by putting AGE into quantiles and rounding LATITUDE_LONGITUDE into single precision. The other two fields are already as coarse as possible.
D
Remove all sensitive data fields, and ask the data science team to build their models using non-sensitive data.