
Ultimate access to all questions.
NO.26 You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non-public information from Google Cloud Storage, processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into Google BigQuery. How should you securely run this workload?
Explanation:
Option B is the correct answer because it follows the principle of least privilege while enabling automation. Here's why:
Service accounts are designed for automation: Unlike user accounts, service accounts are specifically created for applications and automated workloads to authenticate and access Google Cloud resources.
Principle of least privilege: While the Project Owner role has broad permissions, using a service account with this role is still more secure than running jobs with user credentials, as it separates automated processes from human access.
Security best practices: Using service accounts for automated workloads is a Google Cloud security best practice, as it eliminates the need for long-lived user credentials and provides better audit trails.
Why other options are incorrect:
Option A: Simply restricting bucket access doesn't solve the automation problem - you'd still need a way for the automated job to access the files.
Option C: While using a service account with specific permissions is good, the Project Owner role provides the necessary broad permissions for this pipeline that needs to access multiple services (GCS, Dataproc, BigQuery).
Option D: User accounts should not be used for automated workloads, and the Project Viewer role doesn't have sufficient permissions to write to BigQuery or run Dataproc jobs.
This approach ensures secure automation while maintaining the necessary access permissions for the data pipeline to function properly.