Google Professional Data Engineer

Google Professional Data Engineer

Get started today

Ultimate access to all questions.


You are in the process of migrating your existing on-premises data into BigQuery on Google Cloud. Depending on the specific use case, you are considering either streaming or batch-loading methods for data transfer. It is also essential to mask some sensitive data before loading it into BigQuery to ensure data security and compliance. You require a cost-effective, programmatic solution to achieve these objectives. What steps should you take?




Explanation:

Option C is the correct answer. Using Apache Beam SDK for Python through Dataflow allows for programmatic flexibility in designing the pipeline. It supports both streaming and batch data processing modes, enabling flexibility based on different use cases. Dataflow offers a cost-effective serverless model that scales resources as needed, optimizing costs. Furthermore, Beam integrates well with Cloud Data Loss Prevention (Cloud DLP) to ensure sensitive data is masked before loading into BigQuery, thereby maintaining data privacy.