
Answer-first summary for fast verification
Answer: Store the data in a Cloud Storage bucket. Design the processing pipelines to retrieve the data from the bucket.
The correct answer is D. The key requirement in the scenario is to store the data exactly as it was retrieved to handle changes in data structure and ensure reprocessing capabilities. Cloud Storage is well-suited for storing raw, unstructured data exactly as it is received, ensuring no transformations happen before storage. This aligns with the ELT (Extract, Load, Transform) approach typically used in data lakes. Options A and C suggest processing the data before storage, which does not meet the requirement to store data exactly as retrieved. Option B suggests using BigQuery for raw data storage, which is less ideal for handling unstructured and changing data formats compared to Cloud Storage.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Your company is in the process of designing a data lake on Google Cloud Platform (GCP) and aims to develop diverse ingestion pipelines to collect unstructured data from various sources. Once the data is stored in GCP, it will undergo processing through several data pipelines to create a recommendation engine for end users on the company's website. The structure of the data from source systems can change at any time, necessitating a solution that can handle these changes. Additionally, the data must be stored exactly as it was retrieved to allow for reprocessing, should the data structure be incompatible with current processing pipelines. Given these requirements, how should you design the architecture to support this use case after retrieving the data?
A
Send the data through the processing pipeline, and then store the processed data in a BigQuery table for reprocessing.
B
Store the data in a BigQuery table. Design the processing pipelines to retrieve the data from the table.
C
Send the data through the processing pipeline, and then store the processed data in a Cloud Storage bucket for reprocessing.
D
Store the data in a Cloud Storage bucket. Design the processing pipelines to retrieve the data from the bucket.
No comments yet.