
Answer-first summary for fast verification
Answer: Create your pipeline with Dataflow through the Apache Beam SDK for Python, customizing separate options within your code for streaming, batch processing, and Cloud DLP. Select BigQuery as your data sink.
The correct answer is **C**. Here's why: - **Dataflow with Apache Beam SDK for Python** offers the flexibility to create pipelines for both streaming and batch processing, allowing you to tailor the process to your specific needs. - Integrating **Cloud DLP** within your Dataflow pipeline enables you to mask sensitive data before it's loaded into BigQuery, ensuring data privacy and compliance. - Choosing **BigQuery as your data sink** ensures that the processed data is efficiently stored for analysis. **Why the other options are not ideal:** - **A**: Datastream is designed for real-time data replication, not for the initial loading of existing on-premises data or for data masking. - **B**: The BigQuery Data Transfer Service is best for scheduled transfers and lacks the flexibility for programmatic data loading and masking before transfer. - **D**: Cloud Data Fusion may not offer the necessary flexibility for customizing streaming and batch processing options or for programmatic data masking within the pipeline.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are planning to migrate your on-premises data to BigQuery on Google Cloud, with the flexibility to either stream or batch-load data as per your needs. Additionally, you need to obfuscate certain sensitive data before the transfer. Your objective is to achieve this programmatically while keeping costs to a minimum. What is the most efficient and cost-effective approach to accomplish this task?
A
Set up Datastream to replicate your on-premise data on BigQuery.
B
Use the BigQuery Data Transfer Service to schedule your migration. After the data is in BigQuery, connect to the Cloud Data Loss Prevention (Cloud DLP) API to de-identify the necessary data.
C
Create your pipeline with Dataflow through the Apache Beam SDK for Python, customizing separate options within your code for streaming, batch processing, and Cloud DLP. Select BigQuery as your data sink.
D
Use Cloud Data Fusion to design your pipeline, use the Cloud DLP plug-in to de-identify data within your pipeline, and then move the data into BigQuery.