
Answer-first summary for fast verification
Answer: Denormalize the data as much as possible., Develop a data pipeline where status updates are appended to BigQuery instead of updated.
Denormalizing the data as much as possible (A) helps improve query performance by reducing the need for joins on large tables, which can be computationally expensive. Appending status updates to BigQuery instead of updating existing records (D) improves performance because BigQuery is optimized for appending data rather than updating existing records. This strategy also allows the data science team to access the entire transaction history without performance degradation due to frequent updates.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are tasked with developing a data pipeline to facilitate the transfer of time-series transaction data to BigQuery for analysis by your data science team. This data will be used for building machine learning models. The current dataset is 1.5 PB and increases by 3 TB daily. Thousands of transactions are updated with new statuses every hour. Given that the data is heavily structured, which two strategies would maximize performance and usability for your data science team? (Choose two.)
A
Denormalize the data as much as possible.
B
Preserve the structure of the data as much as possible.
C
Use BigQuery UPDATE to further reduce the size of the dataset.
D
Develop a data pipeline where status updates are appended to BigQuery instead of updated.
E
Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery's support for external data sources to query.
No comments yet.