Google Professional Data Engineer

Google Professional Data Engineer

Get started today

Ultimate access to all questions.


As a data engineer, you are tasked with selecting an appropriate database solution for storing time series data related to CPU and memory usage across millions of computers. This data is collected in one-second interval samples. The database must support real-time, ad hoc analytics performed by analysts. Additionally, you need to ensure that the database and its schema design are cost-efficient, avoiding charges per query executed, and scalable for future dataset growth. Which database and data model would you choose to meet these requirements?




Explanation:

The correct answer is C. For time-series data, a narrow table in Bigtable where each row key combines the computer identifier with the sample time at each second is most suitable. This schema allows for efficient storage and retrieval of time-series data, supports future growth, and is effective for real-time analytics. Storing one event per row makes it easier to run queries and avoids the risk of exceeding the recommended maximum row size. Bigtable provides low-latency access to large-scale data, which is crucial for real-time analytics, and utilizing this schema pattern ensures efficient and scalable storage.