
Answer-first summary for fast verification
Answer: Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.
Option A is the most reliable method because it ensures the contents of the tables are identical by comparing hashes of sorted, non-timestamp columns, even without a primary key. Options B and C are less reliable as they only check subsets of data and may lead to false positives. Option D's approach is unclear and may not provide a representative sample for accurate comparison.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
After migrating an ETL job to BigQuery, how can you compare its output with the original job's output when the tables lack a primary key for joining, but you have the original job's output table?
A
Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.
B
Select random samples from the tables using the RAND() function and compare the samples.
C
Select random samples from the tables using the HASH() function and compare the samples.
D
Create stratified random samples using the OVER() function and compare equivalent samples from each table.
No comments yet.