
Answer-first summary for fast verification
Answer: Assign global unique identifiers (GUID) to each data entry.
The best answer is A, which is to assign global unique identifiers (GUID) to each data entry. This method ensures that each data transmission, whether original or re-transmitted, will have a unique identifier, making deduplication straightforward. While using hash values (Option D) may seem effective, it can be less efficient and error-prone because the timestamp being part of the payload can cause different hashes for essentially the same data content. By assigning a GUID, you can eliminate these issues and reliably identify duplicates.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Your company employs a proprietary system to transmit inventory data to a cloud-based data ingestion service at 6-hour intervals. Each transmitted dataset comprises multiple fields and includes a timestamp indicating the exact time of transmission. In instances where doubts arise concerning a transmission's accuracy or integrity, the system is designed to resend the data. Given this context, what is the most efficient method to deduplicate the transmitted data?
A
Assign global unique identifiers (GUID) to each data entry.
B
Compute the hash value of each data entry, and compare it with all historical data.
C
Store each data entry as the primary key in a separate database and apply an index.
D
Maintain a database table to store the hash value and other metadata for each data entry.
No comments yet.