
Answer-first summary for fast verification
Answer: 1. Create a pipeline to de-identify the email field by using recordTransformations in Cloud DLP with format-preserving encryption with FFX as the de-identification transformation type. 2. Load the booking and user profile data into a BigQuery table.
Format-preserving encryption (FPE) with FFX in Cloud DLP is a strong choice for de-identifying PII like email addresses. FPE maintains the format of the data and ensures that the same input results in the same encrypted output consistently. This means the email fields in both datasets can be encrypted to the same value, allowing for accurate joins in BigQuery while keeping the actual email addresses hidden. Masking (Option A) would not preserve the uniqueness required for joins, and dynamic data masking (Options C and D) occurs within BigQuery, which does not satisfy the requirement of de-identifying data before loading into BigQuery.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Your company operates a data platform that continually ingests CSV file dumps containing booking and user profile data from upstream sources into Google Cloud Storage. The analyst team needs to perform a join operation on these datasets using the common email field for their analysis. However, it is crucial to ensure that personally identifiable information (PII) is not exposed to the analysts during this process. To achieve this, you must de-identify the email field in both datasets prior to loading them into BigQuery. What approach should you take?
A
B
C
D