
Google Professional Data Engineer
Get started today
Ultimate access to all questions.
Your organization currently utilizes an on-premises Apache Hadoop cluster to store customer data in Apache Parquet format. Daily data processing tasks are handled by Apache Spark jobs running on this cluster. As part of your migration strategy, both the Spark jobs and the Parquet data need to be transferred to Google Cloud. BigQuery will be the new platform for future data transformation pipelines, requiring the Parquet data to be accessible within BigQuery. Your goal is to leverage managed services to simplify this process while also minimizing changes to ETL data processing and controlling overhead costs. What steps should you take to achieve this?
Your organization currently utilizes an on-premises Apache Hadoop cluster to store customer data in Apache Parquet format. Daily data processing tasks are handled by Apache Spark jobs running on this cluster. As part of your migration strategy, both the Spark jobs and the Parquet data need to be transferred to Google Cloud. BigQuery will be the new platform for future data transformation pipelines, requiring the Parquet data to be accessible within BigQuery. Your goal is to leverage managed services to simplify this process while also minimizing changes to ETL data processing and controlling overhead costs. What steps should you take to achieve this?
Exam-Like