
Answer-first summary for fast verification
Answer: 1. Establish two separate BigQuery external tables on Cloud Storage and Cloud Bigtable. 2. Join these tables via user fields in BigQuery, applying necessary filters.
Option D is correct because BigQuery allows for analytics on data through external tables from Cloud Storage and Bigtable, making it ideal for this scenario. This approach enables direct querying and joining of data from both sources in BigQuery, eliminating the need for additional data movement or processing. By joining on user fields and applying filters, the required data can be retrieved efficiently. This method minimizes data movement, reduces processing overhead, and efficiently joins data from Cloud Spanner and Cloud Bigtable. Options A, B, and C involve unnecessary data movement and processing, making them less efficient solutions.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
As a data scientist at an e-commerce company specializing in shoes, your website utilizes Cloud Spanner for current user state information and Cloud Bigtable for logging user events. Daily backups of Cloud Spanner data are exported to Cloud Storage. Your Data Science team is developing an ML model requiring data from both Cloud Spanner and Bigtable to be joined. What is the most efficient method to achieve this?
A
Utilize a Dataproc cluster to run a Spark job that extracts data from Cloud Bigtable and Cloud Storage for specific users.
B
Develop a Dataflow job to copy data from Cloud Bigtable and Cloud Spanner for specific users.
C
Copy data from Cloud Storage and Cloud Bigtable for specific users using a Dataflow job.
D
No comments yet.