
Answer-first summary for fast verification
Answer: Denormalize the data as much as possible., Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery's support for external data sources to query.
## Explanation **Options A and E are correct** because: ### **Option A - Denormalize the data as much as possible**: - **Performance**: Denormalized tables reduce JOIN operations, improving query performance for data science workloads - **Usability**: Data scientists can work with flat tables without complex JOIN logic - **ML Model Building**: Machine learning algorithms typically work better with denormalized feature sets ### **Option E - Copy daily snapshot to Cloud Storage as Avro file**: - **Cost Optimization**: External tables avoid BigQuery storage costs for the large dataset (1.5 PB + 3 TB/day growth) - **Flexibility**: Data scientists can query external data without moving it into BigQuery - **Avro Format**: Efficient for large datasets and preserves schema **Why other options are incorrect**: - **Option B**: Preserving structure would mean normalized tables with JOINs, which hurts performance for data science - **Option C**: BigQuery UPDATE operations are expensive and not suitable for frequent updates on large datasets - **Option D**: Appending status updates creates data duplication and complicates analysis This approach balances performance, cost, and usability for data science workloads on large datasets.
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
NO.30 You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the initial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this data. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? Choose 2 answers.
A
Denormalize the data as much as possible.
B
Preserve the structure of the data as much as possible.
C
Use BigQuery UPDATE to further reduce the size of the dataset.
D
Develop a data pipeline where status updates are appended to BigQuery instead of updated.
E
Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery's support for external data sources to query.