Google Professional Data Engineer

Google Professional Data Engineer

Get started today

Ultimate access to all questions.


In the context of building a real-time prediction engine that processes files potentially containing Personally Identifiable Information (PII) into Cloud Storage and then into BigQuery, how can the Cloud Data Loss Prevention API (DLP API) be effectively used to mask sensitive data while preserving referential integrity, especially when names and emails serve as common join keys?




Explanation:

The correct approach is to create a pseudonym by replacing PII data with a cryptographic format-preserving token. This method ensures that sensitive information is securely masked while maintaining the data's original format, crucial for referential integrity and the use of names and emails as join keys. Other options either compromise security by storing unredacted data, lack the necessary format preservation for referential integrity, or do not proactively address data masking before storage.