
Answer-first summary for fast verification
Answer: Use a timestamp range filter in the query to fetch the customer's data for a specific range.
The correct answer is B: 'Use a timestamp range filter in the query to fetch the customer's data for a specific range.' This is because garbage collection in Bigtable is a continuous process that can take up to a week to delete expired data. Therefore, relying solely on garbage collection policies may not ensure that outdated data is not visible to analysts. By applying a timestamp range filter to your queries, you can ensure that only data within the desired range is fetched, effectively circumventing the delay in garbage collection.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
As a data engineer at a large ecommerce company, your customer order data is stored in Bigtable. The current garbage collection policy deletes data after 30 days, and the number of versions is set to 1. However, data analysts occasionally encounter customer data that is older than 30 days when running queries to report total customer spending. Your task is to ensure that analysts see only customer data that is no older than 30 days, while also minimizing both cost and overhead. How would you achieve this?
A
Set the expiring values of the column families to 29 days and keep the number of versions to 1.
B
Use a timestamp range filter in the query to fetch the customer's data for a specific range.
C
Schedule a job daily to scan the data in the table and delete data older than 30 days.
D
Set the expiring values of the column families to 30 days and set the number of versions to 2.
No comments yet.