
Google Professional Data Engineer
Get started today
Ultimate access to all questions.
You are responsible for maintaining a data pipeline that involves a Dataflow job which aggregates and writes time series metrics to Bigtable. This particular dataset is crucial as it supplies information to a dashboard utilized by thousands of users within the organization. Recently, there has been an observed latency in data updates within Bigtable, thereby affecting the dashboard's performance and responsiveness. To address this issue, you need to enhance support for additional concurrent users and decrease the data writing time to Bigtable. Which two actions should you undertake? (Choose two.)
You are responsible for maintaining a data pipeline that involves a Dataflow job which aggregates and writes time series metrics to Bigtable. This particular dataset is crucial as it supplies information to a dashboard utilized by thousands of users within the organization. Recently, there has been an observed latency in data updates within Bigtable, thereby affecting the dashboard's performance and responsiveness. To address this issue, you need to enhance support for additional concurrent users and decrease the data writing time to Bigtable. Which two actions should you undertake? (Choose two.)
Explanation:
Increasing the maximum number of Dataflow workers by setting maxNumWorkers in PipelineOptions allows for parallelizing the processing of your data, which can result in faster data updates to Bigtable and improved concurrency. Additionally, increasing the number of nodes in the Bigtable cluster can improve the overall throughput and reduce latency when writing data. This allows Bigtable to handle a higher rate of data ingestion and queries, which is essential for supporting additional concurrent users.