Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

You are tasked with designing a real-time prediction engine on Google Cloud that processes incoming files, some of which may contain Personally Identifiable Information (PII). To ensure compliance with data protection standards, you plan to use the Cloud Data Loss Prevention (DLP) API for scanning these files. The solution must minimize the risk of unauthorized access to PII, be cost-effective, and scalable to handle varying loads. Considering these requirements, what is the most secure and efficient approach to manage the data? Choose the best option.

Real Exam

Directly stream all files to BigQuery and use scheduled DLP API scans to identify and classify PII. This approach leverages BigQuery's analytics capabilities but may expose PII during the scanning interval.

7.4%

Implement a two-bucket system: 'Processing' and 'Archived'. Stream files to the 'Processing' bucket, apply DLP API scans in real-time, and then move files to 'Archived' based on scan results. This reduces exposure time but may increase costs due to real-time scanning.

Comments

Loading comments...

Establish a three-bucket system: 'Quarantine', 'Sensitive', and 'Non-sensitive'. All incoming files are initially placed in 'Quarantine'. Periodic DLP API scans then move files to 'Sensitive' or 'Non-sensitive' based on content, ensuring PII is never exposed in an unsecured state.

77.8%