LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Google Professional Data Engineer

Google Professional Data Engineer

Get started today

Ultimate access to all questions.


Your company employs a proprietary system to transmit inventory data to a cloud-based data ingestion service at 6-hour intervals. Each transmitted dataset comprises multiple fields and includes a timestamp indicating the exact time of transmission. In instances where doubts arise concerning a transmission's accuracy or integrity, the system is designed to resend the data. Given this context, what is the most efficient method to deduplicate the transmitted data?

Exam-Like



Explanation:

The best answer is A, which is to assign global unique identifiers (GUID) to each data entry. This method ensures that each data transmission, whether original or re-transmitted, will have a unique identifier, making deduplication straightforward. While using hash values (Option D) may seem effective, it can be less efficient and error-prone because the timestamp being part of the payload can cause different hashes for essentially the same data content. By assigning a GUID, you can eliminate these issues and reliably identify duplicates.

Powered ByGPT-5