Google Professional Data Engineer

Get started today

Ultimate access to all questions.

Your infrastructure team has established an interconnect link between Google Cloud and the on-premises network, providing a direct and secure connection. You are tasked with designing a high-throughput streaming pipeline to ingest real-time data from an Apache Kafka cluster that is hosted within the on-premises environment. The goal is to store this data in BigQuery, Google Cloud's fully managed, serverless data warehouse, while ensuring minimal latency in the data transfer and storage process. What steps should you take to achieve this objective?

Exam-Like

Setup a Kafka Connect bridge between Kafka and Pub/Sub. Use a Google-provided DataFlow template to read the data from Pub/Sub, and write the data to BigQuery.

29.2%

Use a proxy host in the VPC in Google Cloud connecting to Kafka. Write a DataFlow pipeline, read data from the proxy host, and write the data to BigQuery.

Comments

Loading comments...