You aim to utilize BigQuery ML for constructing a machine learning model and establish a Vertex AI endpoint to manage streaming data in near-real time from various vendors, which might include invalid values. What is the optimal strategy to address this scenario?