
Answer-first summary for fast verification
Answer: Send vendor data to a Pub/Sub topic and use Dataflow to process and sanitize the data before streaming it to BigQuery.
To efficiently create a machine learning model with BigQuery ML and a Vertex AI endpoint for near-real-time streaming data from multiple vendors, including handling potential invalid values, the recommended approach involves using Pub/Sub for data ingestion and Dataflow for real-time processing and sanitization before streaming to BigQuery. This method leverages Pub/Sub's scalability and reliability for high-volume data from diverse sources and Dataflow's serverless capabilities for real-time transformation and cleaning, ensuring a robust, scalable, and fault-tolerant pipeline. Alternatives like direct BigQuery streaming inserts (options B and D) may not adequately address data quality issues, while using Cloud Functions (option C) lacks the scalability and fault tolerance needed for continuous streaming data.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You aim to utilize BigQuery ML for constructing a machine learning model and establish a Vertex AI endpoint to manage streaming data in near-real time from various vendors, which might include invalid values. What is the optimal strategy to address this scenario?
A
Send vendor data to a Pub/Sub topic and use Dataflow to process and sanitize the data before streaming it to BigQuery.
B
Use BigQuery streaming inserts to load data from multiple vendors into a new BigQuery dataset. Configure your BigQuery ML model to use the ingestion dataset as the framing data
C
Send vendor data to a Pub/Sub topic and use a Cloud Function to process and store it in BigQuery
D
Use BigQuery streaming inserts to load data from multiple vendors into the same BigQuery dataset where your BigQuery ML model is deployed
No comments yet.