Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

You are developing a machine learning model using data stored in Google BigQuery. This dataset contains several values classified as Personally Identifiable Information (PII), such as names, addresses, and social security numbers. To comply with data privacy laws and reduce the sensitivity of the dataset before using it for training, you need to anonymize or mask these sensitive columns without removing them, as every column is critical for the model's performance. How should you proceed?

Exam-Like

Using Dataflow, ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column.

1.4%

Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption.

Comments

Loading comments...

Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt.

12.7%

Before training, use BigQuery to select only the columns that do not contain sensitive data. Create an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals.

4.9%