
Answer-first summary for fast verification
Answer: Use Cloud Dataprep and configure the BigQuery tables as the source. Schedule a daily job to clean the data.
The question requires an automated daily process to clean dirty data already in BigQuery while managing cost. Option D (Cloud Dataprep) is optimal because it is specifically designed for data preparation and cleaning, integrates natively with BigQuery as a source, supports scheduled daily jobs, and is cost-effective for batch processing compared to streaming solutions. The community discussion shows strong consensus for D (71% of votes), with high upvotes for comments supporting Dataprep's suitability for this use case. Option A (Dataflow streaming) is less suitable as it is designed for real-time processing, which is unnecessary for daily batches and more expensive. Options B and C are inefficient: B involves unnecessary complexity with Cloud Functions and Compute Engine, and C relies solely on SQL, which may not handle complex data cleaning effectively. While some comments argue for A to clean data during ingestion, the scenario specifies data is already in BigQuery, making D the best fit for cleaning existing data daily at lower cost.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
For the TerramEarth case study, a new architecture writes all incoming data to BigQuery. You have observed that the data is dirty and want to implement an automated daily process to ensure data quality while managing cost. What should you do?
A
Set up a streaming Cloud Dataflow job, receiving data by the ingestion process. Clean the data in a Cloud Dataflow pipeline.
B
Create a Cloud Function that reads data from BigQuery and cleans it. Trigger the Cloud Function from a Compute Engine instance.
C
Create a SQL statement on the data in BigQuery, and save it as a view. Run the view daily, and save the result to a new table.
D
Use Cloud Dataprep and configure the BigQuery tables as the source. Schedule a daily job to clean the data.