
Ultimate access to all questions.
NO.44 Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data. They want to improve this performance while minimizing cost. What should they do?
Explanation:
This question addresses performance optimization in Google Cloud Bigtable when dealing with large-scale data ingestion and processing.
Key Issues:
Analysis of Options:
Option A (Correct): Redefining the schema to evenly distribute reads and writes across the row space is the optimal solution. In Bigtable, performance is heavily dependent on row key design. When reads and writes are concentrated on a small number of rows ("hotspotting"), it creates bottlenecks. By distributing the workload evenly across the row space, you can leverage Bigtable's automatic sharding and achieve better parallelism.
Option B (Incorrect): Simply waiting for the cluster size to increase won't resolve the fundamental schema design issue. Hotspotting will persist regardless of cluster size, and this approach would increase costs without solving the performance problem.
Option C (Incorrect): Using a single row key for frequently updated values would actually worsen the hotspotting problem. This concentrates all updates on one row, creating a severe bottleneck.
Option D (Incorrect): Using sequentially increasing numeric IDs can create hotspotting because new data always goes to the "end" of the table, concentrating writes on a small number of tablets.
Best Practice: The optimal approach is to design row keys that distribute the workload evenly, such as using hash prefixes, salting, or other techniques that spread operations across the entire row space.