
Answer-first summary for fast verification
Answer: Use Dataflow to identify longtail and outlier data points programmatically, with BigQuery as a sink.
The correct answer is B. Dataflow is designed for real-time data processing tasks and can programmatically identify longtail and outlier data points. It allows for near-real-time data cleansing before data is used by AI models. Using BigQuery as the sink ensures that the cleansed and processed data is stored efficiently for further analysis and AI model training. The other options either do not support real-time processing (A and C) or do not easily integrate with BigQuery for subsequent analysis (D).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the context of digital advertising, accurate data is crucial for optimizing AI models and performing effective historical data analysis. Consider you have a dataset comprising ads data, and you need this data for two primary purposes: to serve AI models and to analyze historical trends. A significant aspect of data preparation is identifying longtail and outlier data points, which can potentially skew the analysis and the performance of AI models. To ensure the highest quality of data, you aim to cleanse the data in near-real time before integrating it into your AI models. What actions should you take to achieve this?
A
Use Cloud Storage as a data warehouse, shell scripts for processing, and BigQuery to create views for desired datasets.
B
Use Dataflow to identify longtail and outlier data points programmatically, with BigQuery as a sink.
C
Use BigQuery to ingest, prepare, and then analyze the data, and then run queries to create views.
D
Use Cloud Composer to identify longtail and outlier data points, and then output a usable dataset to BigQuery.