
Ultimate access to all questions.
As a Google Professional Data Engineer, you regularly receive CSV files from a third-party vendor on a monthly basis. However, the schema of these files changes every third month, creating a need for a dynamic data cleansing process. Your primary requirements for handling and transforming this data include automating the execution of these transformations on a set schedule, allowing non-developer analysts to easily modify the transformations, and offering a graphical interface for designing these transformations. What should you do?
A
Use Dataprep by Trifacta to build and maintain the transformation recipes, and execute them on a scheduled basis
B
Load each month's CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query
C
Help the analysts write a Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data's schema changes
D
Use Apache Spark on Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery