
Answer-first summary for fast verification
Answer: Streaming job, PubSubIO, BigQueryIO, side-inputs
The correct answer is C. A streaming job is appropriate because the data comes from Cloud Pub/Sub, which is typically used for streaming data. PubSubIO is required to read from Pub/Sub, and BigQueryIO is necessary to write the enriched data back to BigQuery. Side inputs allow enriching the main input with additional data from BigQuery, making the combination of Streaming job, PubSubIO, BigQueryIO, and side-inputs the right approach.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with designing a data processing pipeline using Apache Beam that will receive streaming data from Cloud Pub/Sub and enrich this data with static reference data stored in BigQuery. It is important to note that the reference data set is small enough to fit entirely in memory of a single worker machine. The final output of the pipeline, which includes the enriched data, should be written back to BigQuery for further analysis. Given these requirements, what type of job and specific transforms should you utilize in this pipeline?
A
Batch job, PubSubIO, side-inputs
B
Streaming job, PubSubIO, JdbcIO, side-outputs
C
Streaming job, PubSubIO, BigQueryIO, side-inputs
D
Streaming job, PubSubIO, BigQueryIO, side-outputs