Google Professional Data Engineer

Get started today

Ultimate access to all questions.

As a Google Professional Data Engineer, you regularly receive CSV files from a third-party vendor on a monthly basis. However, the schema of these files changes every third month, creating a need for a dynamic data cleansing process. Your primary requirements for handling and transforming this data include automating the execution of these transformations on a set schedule, allowing non-developer analysts to easily modify the transformations, and offering a graphical interface for designing these transformations. What should you do?

Exam-Like

Use Dataprep by Trifacta to build and maintain the transformation recipes, and execute them on a scheduled basis

96.3%

Comments

Loading comments...

Help the analysts write a Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data's schema changes

3.7%

Use Apache Spark on Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery