
Answer-first summary for fast verification
Answer: Implement an Apache Beam custom connector to design a Dataflow pipeline that streams the data into BigQuery in Avro format.
The correct answer is **D**. Implementing an Apache Beam custom connector to design a Dataflow pipeline is the most efficient method for importing proprietary flight data into BigQuery with minimal resource usage. Apache Beam supports parallel data processing and integrates seamlessly with Google Cloud Dataflow, making it ideal for streaming data into BigQuery. The Avro format is particularly suited for this task due to its compact, efficient, and schema-based data serialization. - **Option A** is less efficient for continuous data streaming as Cloud Functions are better suited for event-driven tasks rather than handling large data streams. - **Option B** involves unnecessary steps by storing raw data first and transforming it later, which could lead to higher resource consumption. - **Option C** is not optimal because Apache Hive is primarily used for querying and analyzing data in HDFS or cloud storage, not for streaming data into BigQuery. Additionally, CSV format is less efficient than Avro for large-scale data streaming.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
An aerospace company uses a unique data format for storing flight data. The challenge is to connect this new data source to BigQuery and stream the data efficiently. Which method would you choose to import the data into BigQuery with minimal resource consumption?
A
Develop a shell script to activate a Cloud Function that executes periodic ETL batch jobs on the new data source.
B
Employ a standard Dataflow pipeline to initially store the raw data in BigQuery, and then reformat it later when the data is accessed.
C
Utilize Apache Hive to create a Dataproc job that streams the data into BigQuery using CSV format.
D
Implement an Apache Beam custom connector to design a Dataflow pipeline that streams the data into BigQuery in Avro format.
No comments yet.