Ultimate access to all questions.
As a Google Cloud Professional Machine Learning Engineer, you are tasked with building a real-time prediction engine that streams files to Google Cloud. Some of these files may contain Personally Identifiable Information (PII). To ensure compliance with data privacy regulations, you plan to use the Cloud Data Loss Prevention (DLP) API to scan for PII and protect sensitive data. How should you structure your data pipeline to ensure that PII is not accessible by unauthorized individuals?
Explanation:
Option D is the correct answer. Creating three buckets of data—Quarantine, Sensitive, and Non-sensitive—ensures optimal security by scanning and classifying data before moving it into appropriate categories. The Quarantine bucket serves as a staging area where all incoming data is initially stored. Using the Cloud Data Loss Prevention (DLP) API, data in the Quarantine bucket is periodically or automatically scanned for PII. Once scanned, data is then moved to either the Sensitive or Non-sensitive bucket based on the scan results, ensuring that PII is not accessible by unauthorized individuals. This method provides a clear separation between potentially sensitive and non-sensitive data, minimizing the risk of unauthorized access to PII.