
Answer-first summary for fast verification
Answer: Create a Pub/Sub topic for all vendor data, employ Dataflow to process and sanitize the Pub/Sub data, and then stream it to BigQuery.
The correct answer is D. This approach involves creating a Pub/Sub topic to decouple data ingestion from processing, allowing for asynchronous communication. Dataflow is then used for real-time data processing and sanitization, which is crucial for handling invalid values efficiently before streaming the data to BigQuery. This method supports continuous streaming data analysis in near-real time. - **Option A** is incorrect because it lacks a mechanism for processing and sanitizing data before use in machine learning, and using an 'ingestion' dataset as framing data may not efficiently handle continuous streaming data. - **Option B** is incorrect as it omits essential data processing and sanitization steps, directly streaming data without addressing invalid values. - **Option C** is incorrect because, while it processes data via a Cloud Function, it does not utilize Dataflow's capabilities for complex transformations, making it less efficient for continuous streaming data with potential invalid values.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are developing a machine learning model using BigQuery ML and plan to host the model with Vertex AI to handle continuous streaming data from various vendors, which may include invalid values. What is the best approach to manage potentially invalid values in the data stream?
A
Establish a new BigQuery dataset for streaming inserts from multiple vendors and configure your BigQuery ML model to utilize the 'ingestion' dataset as the framing data.
B
Directly use BigQuery streaming inserts to land data from multiple vendors into the dataset where your BigQuery ML model is deployed.
C
Set up a Pub/Sub topic for all vendor data and link a Cloud Function to the topic to process and store the data in BigQuery.
D
Create a Pub/Sub topic for all vendor data, employ Dataflow to process and sanitize the Pub/Sub data, and then stream it to BigQuery.
No comments yet.