
Answer-first summary for fast verification
Answer: Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption.
Option B is the correct answer because it uses the Cloud Data Loss Prevention (DLP) API to identify sensitive data and applies Format Preserving Encryption (FPE) via Dataflow, which maintains the data's format and distribution while encrypting PII. This ensures the data remains usable for machine learning model training. Option A (randomization) degrades data utility by altering distributions. Option C (AES-256 with salt) does not preserve format, making the data incompatible with ML models. Option D (selecting non-sensitive columns) is invalid as every column is critical, and authorized views only restrict access without reducing data sensitivity.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are training a machine learning model with data from BigQuery that contains Personally Identifiable Information (PII). You must reduce the dataset's sensitivity for training, but all columns are essential for the model. What is the correct approach?
A
Using Dataflow, ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column.
B
Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption.
C
Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt.
D
Before training, use BigQuery to select only the columns that do not contain sensitive data. Create an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals.
No comments yet.