
Answer-first summary for fast verification
Answer: Use Cloud Dataflow to run your transformations. Monitor the job system lag with Stackdriver. Use the default autoscaling setting for worker instances.
The correct answer is C. Using Cloud Dataflow to run your transformations, monitor the job system lag with Stackdriver, and leverage the default autoscaling setting for worker instances is the best choice. Cloud Dataflow is designed for scalable and efficient processing of streaming data and dynamically adjusts resources based on actual processing needs. This helps manage costs and handle fluctuating data volumes efficiently without manual intervention.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with designing a cost-efficient data pipeline on Google Cloud that ingests JSON messages from Cloud Pub/Sub, processes them, and subsequently loads the transformed data into BigQuery. The solution must accommodate fluctuating input data volumes with minimal manual intervention. Which approach should you take to achieve this?
A
Use Cloud Dataproc to run your transformations. Monitor CPU utilization for the cluster. Resize the number of worker nodes in your cluster via the command line.
B
Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an operational output archive. Locate the bottleneck and adjust cluster resources.
C
Use Cloud Dataflow to run your transformations. Monitor the job system lag with Stackdriver. Use the default autoscaling setting for worker instances.
D
Use Cloud Dataflow to run your transformations. Monitor the total execution time for a sampling of jobs. Configure the job to use non-default Compute Engine machine types when needed.