Google Professional Data Engineer

Google Professional Data Engineer

Get started today

Ultimate access to all questions.


An IoT service utilizes Bigtable for storing timeseries data. However, it's observed that write operations are predominantly occurring on a single node instead of being uniformly distributed across all nodes. What might be the underlying cause of this issue?




Explanation:

This scenario exemplifies hot spotting, where the workload is disproportionately concentrated on a few nodes instead of being evenly distributed. In Bigtable, such a situation can arise from row keys that are lexically similar and created in close temporal proximity. Bigtable's write operation distribution is determined by the row key, not by any GCP load balancer. The initial data writing location is unaffected by replication. Being a wide column database, Bigtable can accommodate a vast number of columns, and the quantity of columns does not influence the data distribution across nodes. For more details, refer to Bigtable documentation on performance.