
Answer-first summary for fast verification
Answer: Use a service account that has permission to read the batch files and write to BigQuery.
The best practice for automating jobs is to use service accounts with the least privilege necessary to perform the task. In this scenario, the job requires reading batch files from Cloud Storage and writing results to BigQuery. Thus, creating a service account with permissions to read from the Cloud Storage bucket and write to BigQuery, and using this account to run the job, is the most secure approach. Reference: [Google Cloud Dataproc Service Accounts](https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/service-accounts#dataproc_service_accounts_2)
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
To securely automate a data pipeline process involving nightly batch files containing non-public information from Google Cloud Storage, which needs processing by a Spark Scala job on a Google Cloud Dataproc cluster and results deposited into Google BigQuery, what is the most secure approach?
A
Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery.
B
Grant the Project Owner role to a service account and run the job using that account.
C
Restrict access to the Google Cloud Storage bucket to only allow you to see the files.
D
Use a service account that has permission to read the batch files and write to BigQuery.
No comments yet.