
Answer-first summary for fast verification
Answer: Utilize the Cloud Data Loss Prevention (DLP) API to identify sensitive data, then apply Dataflow with the DLP API to encrypt sensitive values using Format Preserving Encryption., Implement both Format Preserving Encryption for columns used in model training and AES-256 encryption for columns not directly used in training but required for compliance.
The optimal strategy involves using the Cloud DLP API to detect sensitive data and Dataflow with the DLP API to apply Format Preserving Encryption, ensuring data remains useful for training while protecting sensitive information. Additionally, implementing AES-256 encryption for compliance-related columns not used in training addresses privacy without affecting model performance. Filtering out sensitive columns or scrambling data reduces utility or distorts data distribution, potentially harming model performance. Sole reliance on AES-256 encryption without format preservation may render data unusable for training.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In the context of training a machine learning model with data stored in BigQuery that includes Personally Identifiable Information (PII), and considering all columns are crucial for the model's performance, what are the best practices to ensure privacy without diminishing the data's utility? Choose two correct options.
A
Utilize the Cloud Data Loss Prevention (DLP) API to identify sensitive data, then apply Dataflow with the DLP API to encrypt sensitive values using Format Preserving Encryption.
B
Before training, filter out columns containing sensitive data in BigQuery and establish an authorized view to restrict access to sensitive information.
C
Employ Dataflow to extract columns with sensitive data from BigQuery and then randomly shuffle the values within each sensitive column.
D
Scan for sensitive data using the Cloud Data Loss Prevention (DLP) API, then use Dataflow to replace all sensitive data with AES-256 encryption, including a salt.
E
Implement both Format Preserving Encryption for columns used in model training and AES-256 encryption for columns not directly used in training but required for compliance.
No comments yet.