
Answer-first summary for fast verification
Answer: Employ Dataflow for programmatic identification of longtail and outlier data points, using BigQuery as the destination.
The correct answer is **B**. Dataflow is a Google Cloud service designed for real-time data processing and analysis, making it ideal for programmatically identifying longtail and outlier data points in near-real time. BigQuery serves as a scalable, cost-effective data warehouse for analytics, acting as a sink for the cleansed data ready for AI model training or further analysis. - **Option A** suggests using Cloud Composer, which, while feasible, is less commonly used for real-time data processing compared to Dataflow. - **Option C** involves BigQuery for data handling and analysis, which may not efficiently address the need for near-real-time identification of anomalies. - **Option D** proposes a combination of Cloud Storage, shell scripts, and BigQuery views, which lacks the efficiency and real-time processing capability provided by Dataflow.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are tasked with preparing ads data for AI models and historical analytics, where identifying longtail and outlier data points is crucial. The data requires near-real-time cleansing before AI model usage. Which approach should you adopt for data cleansing?
A
Utilize Cloud Composer to pinpoint longtail and outlier data points, then export a clean dataset to BigQuery.
B
Employ Dataflow for programmatic identification of longtail and outlier data points, using BigQuery as the destination.
C
Leverage BigQuery for data ingestion, preparation, and analysis, followed by query execution to generate views.
D
Adopt Cloud Storage as a data warehouse, process data with shell scripts, and use BigQuery to create dataset views.
No comments yet.