
Ultimate access to all questions.
TerramEarth, a company specializing in heavy equipment for mining and agriculture, has 20 million vehicles in operation, collecting 120 fields of data per second. The data is stored locally and is accessed during vehicle maintenance. Approximately 200,000 vehicles transmit data via a cellular network, contributing to about 9 TB/day. Their existing systems are based in a single US west coast data center, processing data with significant delay, leading to downtime for customers. The company has introduced a new architecture that writes all incoming data to BigQuery but has noticed that the data is dirty. How can you ensure data quality on an automated daily basis while managing costs?
A
Set up a streaming Cloud Dataflow job, receiving data by the ingestion process. Clean the data in a Cloud Dataflow pipeline.
B
Create a Cloud Function that reads data from BigQuery and cleans it. Trigger the Cloud Function from a Compute Engine instance.
C
Create a SQL statement on the data in BigQuery, and save it as a view. Run the view daily, and save the result to a new table.
D
Use Cloud Dataprep and configure the BigQuery tables as the source. Schedule a daily job to clean the data.