
Explanation:
To efficiently create a machine learning model with BigQuery ML and a Vertex AI endpoint for near-real-time streaming data from multiple vendors, including handling potential invalid values, the recommended approach involves using Pub/Sub for data ingestion and Dataflow for real-time processing and sanitization before streaming to BigQuery. This method leverages Pub/Sub's scalability and reliability for high-volume data from diverse sources and Dataflow's serverless capabilities for real-time transformation and cleaning, ensuring a robust, scalable, and fault-tolerant pipeline. Alternatives like direct BigQuery streaming inserts (options B and D) may not adequately address data quality issues, while using Cloud Functions (option C) lacks the scalability and fault tolerance needed for continuous streaming data.
Ultimate access to all questions.
You aim to utilize BigQuery ML for constructing a machine learning model and establish a Vertex AI endpoint to manage streaming data in near-real time from various vendors, which might include invalid values. What is the optimal strategy to address this scenario?
A
Send vendor data to a Pub/Sub topic and use Dataflow to process and sanitize the data before streaming it to BigQuery.
B
Use BigQuery streaming inserts to load data from multiple vendors into a new BigQuery dataset. Configure your BigQuery ML model to use the ingestion dataset as the framing data
C
Send vendor data to a Pub/Sub topic and use a Cloud Function to process and store it in BigQuery
D
Use BigQuery streaming inserts to load data from multiple vendors into the same BigQuery dataset where your BigQuery ML model is deployed
No comments yet.