
Answer-first summary for fast verification
Answer: Develop a data pipeline where status updates are appended to BigQuery instead of updated., Denormalize the data as much as possible.
The optimal strategies for this scenario are: - **Appending status updates to BigQuery (A)**: This approach is efficient for handling frequent updates, maintains a historical record of changes, and aligns with the dataset's growth pattern. - **Denormalizing the data (D)**: Reduces the need for joins, stores redundant data to enhance query performance, and simplifies data analysis for machine learning models. Other options are less suitable because: - **Using BigQuery UPDATE (C)**: Can be resource-intensive and inefficient for frequent updates. - **Preserving data structure (E)**: May not optimize query performance as effectively as denormalization. - **Storing snapshots in Cloud Storage (B)**: Introduces complexity and may not support real-time analysis needs effectively.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
To support the data science team's analysis, a data pipeline is required to transfer time-series transaction data into BigQuery. The dataset, initially 1.5 PB in size and growing by 3 TB daily, consists of thousands of transactions updated hourly with new statuses. This structured data will be utilized for building machine learning models. Which two strategies would best optimize performance and usability for the data science team? (Select two.)
A
Develop a data pipeline where status updates are appended to BigQuery instead of updated.
B
Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery‘s support for external data sources to query.
C
Use BigQuery UPDATE to further reduce the size of the dataset.
D
Denormalize the data as much as possible.
E
Preserve the structure of the data as much as possible.