
Answer-first summary for fast verification
Answer: Use the DLP API to de-identify the sensitive data before loading it into BigQuery.
Option B is the correct answer because it uses Google Cloud's Data Loss Prevention (DLP) API to systematically identify and de-identify PII before loading data into BigQuery. This approach is scalable, automated, and preserves the analytical value of the data by transforming sensitive information rather than removing it entirely. The community discussion shows 100% consensus on B, with the key insight that DLP API is specifically designed for this purpose. Option A is insufficient as it only restricts access but doesn't remove PII from BigQuery. Option C violates the policy requirement that PII must remain in Cloud Storage. Option D is not scalable due to manual effort and is prone to human error.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You need to train a model to predict fraudulent credit card transactions using unstructured data stored in Cloud Storage. The data must be converted to a structured format for analysis in BigQuery. Due to company policy, any data containing personally identifiable information (PII) must remain in Cloud Storage. You need to implement a scalable solution that preserves the data's analytical value. What should you do?
A
Use BigQuery’s authorized views and column-level access controls to restrict access to PII within the dataset.
B
Use the DLP API to de-identify the sensitive data before loading it into BigQuery.
C
Store the unstructured data in a separate PII-compliant BigQuery database.
D
Remove the sensitive data from the files manually before loading them into BigQuery.
No comments yet.