
Answer-first summary for fast verification
Answer: Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format.
Option D is the best approach given the requirements. Using an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format is efficient. Dataflow provides managed resource scaling for stream processing, and Avro format offers schema evolution capabilities and efficient serialization. A custom connector in Apache Beam allows integrating the proprietary data source without requiring excessive code. This approach is more efficient compared to periodic batch jobs (Option A), storing raw data then transforming it (Option B), or using manual cluster management with Dataproc (Option C).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
An aerospace company utilizes a proprietary data format for storing its flight data. As a data engineer, your task is to establish a connection between this new data source and Google BigQuery. Subsequently, you need to stream the flight data into BigQuery. The objective is to efficiently import the data while minimizing resource consumption. What approach should you take?
A
Write a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source.
B
Use a standard Dataflow pipeline to store the raw data in BigQuery, and then transform the format later when the data is used.
C
Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format.
D
Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format.
No comments yet.